Statistical Methods for the Quantitative Genetic Analysis of High-Throughput Phenotyping Data

Morota, Gota; Jarquin, Diego; Campbell, Malachy T.; Iwata, Hiroyoshi

doi:10.1007/978-1-0716-2537-8_21

Gota Morota⁴,
Diego Jarquin⁵,
Malachy T. Campbell⁴ &
…
Hiroyoshi Iwata⁶

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2539))

2621 Accesses
5 Citations
12 Altmetric

Abstract

The advent of plant phenomics, coupled with the wealth of genotypic data generated by next-generation sequencing technologies, provides exciting new resources for investigations into and improvement of complex traits. However, these new technologies also bring new challenges in quantitative genetics, namely, a need for the development of robust frameworks that can accommodate these high-dimensional data. In this chapter, we describe methods for the statistical analysis of high-throughput phenotyping (HTP) data with the goal of enhancing the prediction accuracy of genomic selection (GS). Following the Introduction in Sec. 1, Sec. 2 discusses field-based HTP, including the use of unoccupied aerial vehicles and light detection and ranging, as well as how we can achieve increased genetic gain by utilizing image data derived from HTP. Section 3 considers extending commonly used GS models to integrate HTP data as covariates associated with the principal trait response, such as yield. Particular focus is placed on single-trait, multi-trait, and genotype by environment interaction models. One unique aspect of HTP data is that phenomics platforms often produce large-scale data with high spatial and temporal resolution for capturing dynamic growth, development, and stress responses. Section 4 discusses the utility of a random regression model for performing longitudinal modeling. The chapter concludes with a discussion of some standing issues.

You have full access to this open access chapter, Download protocol PDF

A two-stage approach for the spatio-temporal analysis of high-throughput phenotyping data

Article Open access 24 February 2022

A One-Stage Approach for the Spatio-temporal Analysis of High-Throughput Phenotyping Data

Article 23 July 2024

Modelling of Genotype by Environment Interaction and Prediction of Complex Traits across Multiple Environments as a Synthesis of Crop Growth Modelling, Genetics and Statistics

Key words

1 Introduction

The predicted rise in global temperatures, increased variability of precipitation events, and increased competition for freshwater resources and arable land threaten to place unique constraints on global agriculture. Plant breeders in the twenty-first century will need to develop cultivars that are both high-yielding and resilient to climate change. The evaluation and development of breeding material requires a multifaceted approach, necessitating the consideration of multiple complex, and often interdependent traits. Successful germplasm development is not only dependent on the increase in the performance of breeding material that is achieved each cycle but also the amount of time before a cultivar is released to the end-users. Moreover, the development of elite cultivars tolerant to abiotic stresses requires careful consideration of a suite of morphological and physiological traits that facilitate adaptation (e.g., plasticity and stability) to a range of environmental conditions. Thus, genetic improvement in this respect is a highly demanding process that requires extensive phenotypic evaluation in multiple environments. Advancements in sequencing have led to new genomic tools and have opened new avenues of research that aid breeders in their selection procedure. For instance, next-generation sequencing techniques such as genotyping-by-sequencing [1] have significantly increased the number of markers discovered and the number of individuals that can be sequenced, providing a cost-efficient tool for breeders to obtain the genotypic profiles of individuals.

In parallel with next-generation sequencing advancements, new statistical methods have been developed to enable utilization of the vast amount of available genomic information for selection purposes. This is known as genomic selection (GS), and its fundamental concept was first introduced by Meuwissen et al. [2]. GS predicts the performance of unobserved individuals based on the linkage disequilibrium between markers and causal loci and the genomic relatedness between observed and unobserved individuals. It has been shown that GS can increase genetic gain by reducing the number of cycles and the number of progeny that need to be phenotypically tested, thus reducing the cost of a breeding pipeline. Since Meuwissen et al. [2], there have been improvements in the prediction accuracy of GS through the incorporation of pedigree information [3,4,5,6], environmental covariates, and genotype by environment interactions [7,8,9,10].

One of the main advantages of GS over phenotypic selection is that phenotypic information is not required for the validation set. However, the acquisition of accurate phenotypic information is still a crucial component for the training or calibration set in the model building process. In other words, the phenotypic information of selection candidates is not used directly for selection, but the predictive ability of the models is negatively affected by the absence of accurate phenotypes. Obtaining precise phenotypic values is not trivial, but it is a critical part of genome-enabled breeding [11,12,13].

In recent years, high-throughput phenotyping (HTP) has become an emerging technology that can assist breeders in improving selection procedures and developing commercial cultivars more rapidly and efficiently [14]. In particular, image-based plant phenotyping enables frequent, non-destructive evaluation of multiple traits for a large number of plants with high precision. Image-based phenotyping offers several advantages, including being generally non-destructive, requiring low or no physical human labor input, being cost effective, and the ability to measure multiple traits at the same time in different locations and at different developmental growth stages [11, 15].

There are a wide range of HTP platforms that have been developed for the purpose of providing dense phenotypic information [16]. Remote sensing and robotic systems developed in greenhouses and growth chambers have a high initial cost but can be fully automated. Alternatively, HTP data can also be collected in the field, as described later. In general, these field-based systems are associated with a high initial cost and also require a well-trained operator for collecting high-quality data [15]. Reynolds et al. [15] characterized and compared the available platforms in terms of associated costs and purposes. Despite such costs, a growing number of breeding programs are utilizing HTP platforms to better understand the genetics of quantitative traits and leverage these high-dimensional data to enhance selection. Specifically, HTP can be used to both generate dependent phenotypic variables for the training set in prediction models and provide additional information on genetic predictor variables in GS models, thereby improving prediction accuracies for conventional breeding targets, such as yield. Thus, this type of data can serve two main purposes: (1) as a primary trait response (e.g., plant height, canopy coverage, and number of leaves), and (2) as a covariate associated with the target trait response (e.g., yield). We discuss these points further in the following sections.

2 Field-Based High-Throughput Phenotyping Using UAV

In this section, we show that HTP accelerates plant breeding by improving the response to selection [17]. HTP methods allow us to measure plants efficiently and accurately via automatic or semi-automatic analysis of data collected by cameras and sensors [18]. Methods for measuring plants cultivated in a field are collectively known as field-based HTP. Field-based HTP enables the measurement of a large number of plots in an experimental field using cameras and sensors mounted on different platforms, such as unoccupied aerial vehicles (UAV) [19], carts [20], tractors [21], and gantry cranes. Field-based HTP not only improves the efficiency and accuracy of phenotyping of plants in a field but also makes it possible to evaluate traits that are difficult to measure with conventional phenotyping methods. In particular, the UAV is one of the most cost-effective and easy-to-use platforms [22, 23]. Although the type of camera or sensor that can be mounted on a UAV is restricted by the payload capacity of the UAV, light-weight and small cameras and sensors have been developed, and their precision and cost-efficiency have rapidly improved in recent years. The UAV is commonly equipped with digital cameras, multispectral cameras, and thermal infrared imagers in field-based HTP [24, 25]. In contrast, hyperspectral cameras and LiDAR (light detection and ranging) are currently not commonly mounted on a UAV but are mainly ground-based HTP platforms [26,27,28,29,30] because of their weight, size, and cost. The commercialization of the UAV and related equipment is progressing in various fields, and various measurement devices will be available for HTP in the near future.

Plant characteristics that can be measured using UAV are roughly divided into three types of traits: (1) geometric traits, (2) spectral traits, and (3) physiological traits. For geometric traits, plant height, canopy cover, and canopy volume are measured mainly with RGB cameras or multispectral cameras [26, 28,29,30,31,32,33,34,35,36]. To measure these traits, a method called Structure from Motion (SfM) is used to estimate the three-dimensional (3D) structures of plants or plant canopies from a sequence of images acquired by a UAV. The structure is obtained using a set of data points, called a point cloud, in a 3D space. The 3D coordinate information of a point cloud is converted into a digital surface model (DSM) and an orthomosaic image. DSM is used for measuring plant 3D structural traits, such as plant height and canopy volume, while orthomosaic images are used for traits evaluated from above the ground, such as canopy cover. Lodging of plants can also be measured by DSM analysis [37]. The numbers and locations of flowers, blooms, and heads are also measured as geometric traits [38, 39]. In these studies, image-based machine learning has been used for the detection of target objects (i.e., flowers, blooms, or heads) from images acquired by UAV. Guo et al. [38] employed a two-step machine learning method for the detection of sorghum heads and attained high accuracy on various genotypes with different head morphologies and at different growth stages. Xu et al. [39] used a convolutional neural network to detect cotton blooms and estimated the 3D coordinates of the blooms using a dense point cloud constructed by SfM. These two studies demonstrated the potential of the combinatory use of image-based machine learning and HTP. Moreover, these studies suggest that simple but labor-intensive measurements, such as the monitoring of flowering and heading, can be performed on a much larger scale with HTP and image-based machine learning than with conventional methods.

For spectral traits, vegetation indices (VI) calculated from multispectral images acquired by UAV are used for evaluating vegetation properties, such as plant structure, biochemistry, and plant physiological and stress status [31, 33, 34, 40,41,42,43,44,45,46,47,48]. A large number of VIs have been proposed and have been used in ground-based platforms, aircraft, and satellite remote sensing. The fine spatial resolution of a UAV enables the removal of soil and shadow pixels from images and can improve the estimation of vegetative properties. Jay et al. [47] used 6-band multispectral cameras to evaluate the structural and biochemical plant traits of green fraction, green area index, leaf chlorophyll content, and canopy chlorophyll and nitrogen contents, showing that the fine spatial resolution of the UAV always improved the estimation accuracy of these traits. Although multispectral images allow us to estimate various VIs better than RGB images, multispectral cameras are usually more expensive and have lower resolution than RGB cameras. To resolve this issue, Khan et al. [49] proposed a method for model-based estimation of VIs using RGB images. In this method, mean VI values were computed from the near infrared and red channels of corresponding plots, and then a deep neural network was trained with the RGB images as the input source and the VI values as the target output. A similar approach can be applied to the estimation of hyperspectral VIs from multispectral or RGB images and will be useful because hyperspectral cameras are usually much more expensive than multispectral cameras.

As for physiological traits, traits such as leaf chlorophyll content, protein content, biomass, crop vigor, nutrition status, and water status are measured by various methods including 3D construction and spectral VIs. A method that is specific to physiological traits is thermal infrared imaging, which enables the measurement of canopy temperature and can be used as a tool to indirectly evaluate the transpiration rate of a plant. Tattaris et al. [24] used a thermal infrared camera and a multispectral camera coupled with UAV to measure canopy temperature and the VI of wheat and found that data acquired by UAV generally exhibited stronger correlations with yield and biomass than data obtained from ground-based phenotyping. Ludovisi et al. [50] applied thermal infrared imaging to measure the canopy temperature of black poplar using UAV and found that the canopy temperature showed a good correlation with ground-truth stomatal conductance. Although the canopy temperature is an important indicator of stress status, it is extremely sensitive to small environmental changes, making it difficult to assess through slow ground-based methods [37]. HTP using UAV provides a good solution for this problem.

2.1 Application of HTP in Breeding Populations

When selecting breeding populations using HTP, two relatively simple methods are considered: (1) indirect selection and (2) index selection. Another method, selection based on prediction with HTP and genomic information, will be described later. When genetic correlations exist, selection for one trait will cause corresponding changes in other traits that are correlated [51]. This change in response due to genetic correlation is called a correlated response and may be caused by pleiotropy or linkage disequilibrium.

In the indirect selection method, a target trait, X, is selected indirectly by selection for trait Y, which has a genetic correlation with trait X and can be measured by HTP. It is possible to improve the selection efficiency of the target trait, the measurement of which would be costly, time consuming, or labor intensive, with traits readily measured by HTP. For example, Madec et al. [26] measured wheat plant height with HTP using LiDAR and UAV and found that it was highly correlated with the plant height measured at the ground level. They also demonstrated that heading date could be estimated based on a growth curve of plant height measured by LiDAR. Kyratzis et al. [44] evaluated the potential use of VIs acquired by UAV for durum wheat phenotyping and found that one index was significantly correlated with grain yield.

In the index selection method, a target trait, X, is selected based on an index calculated from phenotypes of a set of m traits, Ys, related to the target trait. The simplest index is a linear combination expressed as

$$ I=\sum \limits_{j=1}^m{b}_j{y}_j $$

where b _j and y _j are the weight and phenotypic value of trait Y_j, respectively. If we consider b _j and y _j as the effect and state of marker j in a set of genome-wide markers, index selection becomes GS. The weight, which represents the relative importance of each trait, can be determined by multivariate regression. For example, Kefauver et al. [25] built a model regressing the grain yield on VIs acquired by HTP using UAV with stepwise regression and found that the regression model explained 77.8% of the grain yield variation. Yu et al. [27] performed hyperspectral imaging of a wheat canopy and used the resulting data to detect Septoria tritici blotch disease and to quantify the severity of infection. They used partial least squares regression to build a prediction model for severity and found that the accuracy of prediction (correlation between observed and predicted values) was 0.38–0.60 for three disease metrics. Non-linear relationships between trait X and a set of traits Ys can also be modeled in a selection index. Various types of models, including known and ad hoc machine learning models, can be used for building an index. Thorp et al. [52] proposed a method for deriving daily evapotranspiration based on a daily soil water balance model named FAO-56 [53], which was derived from an index acquired by HTP using UAV, to evaluate and improve the crop water use efficiency of cotton varieties. Collectively, indirect or index selection based on traits measured by HTP has strong potential to streamline the selection of important agronomic traits, such as plant height, heading date, grain yield, and disease resistance.

2.2 Genetic Gain in HTP-Based Selection

HTP-based selection and GS can accelerate plant breeding by improving the efficiency of selection. Response to selection is an index for evaluating the efficiency of selection [54]. The response to selection R is defined as the difference between the mean phenotypic values ($ {\bar{y}}_o $) of progeny generated from the selected parents and the mean phenotypic values ($ {\bar{y}}_p $) of the parental population before selection.

$$ R={\bar{y}}_o-{\bar{y}}_p. $$

If we denote the heritability of a trait targeted in the selection as h ² and define the selection differential as the product of the phenotypic standard deviation σ _p and selection intensity i in the parent population,

$$ R=i{h}^2{\sigma}_p. $$

This is an important formula in breeding known as the “breeder’s equation.” If a breeder knows the heritability of the target trait h ² and the standard deviation of the phenotype σ _p in the parent population, it is possible to calculate the expected response to selection R under intensity i. Using the definition of heritability, $ {h}^2={\sigma}_g^2/ {\sigma}_p^2 $, we can rewrite the formula as

$$ R= ih{\sigma}_g, $$

where σ _g is the square root of the genetic variance in the parent population.

Now we consider the case in which we select trait X indirectly by selecting for trait Y, measured with HTP. In this case, the response to selection of the indirect selection of trait X with trait Y is

$$ {R}_{XY}={i}_Y{h}_Y{r}_{XY}{\sigma}_{gX}, $$

where i _Y is the selection intensity of trait Y, h _Y is the square root of the heritability of trait Y, r _XY is the genetic correlation between trait X and trait Y, and σ _gX is the square root of the genetic variance of trait X in the parent population. To improve the efficiency of selection with HTP, this value should be larger than the response to selection of direct selection of trait X, i.e., R _X = i _X h _X σ _gX. That is, the condition for improving the selection efficiency with HTP is

$$ {i}_Y{h}_Y{r}_{XY}>{i}_X{h}_X. $$

When the selection intensities of the two traits are the same (i _Y = i _X), the following two conditions should be satisfied: (1) trait Y measured by HTP has a higher heritability than trait X, and (2) the genetic correlation between trait X and trait Y is high. With HTP, however, it is often possible to evaluate a large number of genotypes (strains or individuals) as compared with direct selection of trait X using a conventional phenotyping method. Therefore, the selection intensity of trait Y can be increased compared to the selection intensity of trait X. If i _Y > i _X, even when the heritability of trait Y is not larger than that of trait X, it may be possible to perform indirect selection on trait X with higher efficiency than that of direct selection.

Index selection with HTP and GS both involve indirect selection of trait X based on the index I, which is calculated based on traits measured by HTP or genome-wide marker genotypes. The response to selection is represented as

$$ {R}_{XI}={i}_I{r}_{XI}{\sigma}_{gX}, $$

where i _I is the selection intensity of the index I and r _XI is the accuracy of selection of trait X based on index I. The condition that the response to index selection is greater than the response to direct selection of trait X is

$$ {i}_I{r}_{XI}>{i}_X{h}_X. $$

When the selection intensities of index I and trait X are the same (i _I = i _X) and the accuracy r _XI of selection of trait X based on index I exceeds the square root of the heritability of trait X, h _X, the efficiency of selection by index selection exceeds the efficiency of direct selection of trait X. As in the case of indirect selection using trait Y, if i _I > i _X, even when the accuracy r _XI of selection of trait X based on index I does not exceed the square root of the heritability of trait X, index selection has a higher efficiency than direct selection.

When we consider the efficiency of a breeding program, it is important to evaluate the genetic gain per unit time. Dividing the reaction to selection R by the time δ _X required for one cycle of selection, we obtain

$$ \Delta {G}_X=\frac{i{h}_X{\sigma}_{gX}}{\delta_X}, $$

where ΔG is the genetic gain per time. The genetic gain of indirect selection of trait X with trait Y is

$$ \Delta {G}_{XY}=\frac{i{h}_Y{r}_{XY}{\sigma}_{gX}}{\delta_Y}, $$

and the genetic gain of index selection of trait X with index I is

$$ \Delta {G}_{XI}=\frac{i_I{r}_{XI}{\sigma}_{gX}}{\delta_I}. $$

Here, δ _Y and δ _I are the times required for one cycle of indirect and index selection, respectively. The time required for one cycle of selection can be shorter for trait Y and index I than for trait X. For example, the yield and quality of a grain crop are usually evaluated with multiple plants on a plot-by-plot basis. However, in indirect and index selection, it may be possible to perform selection on a single plant basis in earlier generations, such as second-generation hybrids (F₂). In such a case, δ _Y (or δ _I) < δ _X, and even when the response to selection R _XY or R _XI is smaller than the response to selection R _X, the genetic gain per unit time becomes greater under indirect and index selection than under direct selection.

As described above, the efficiency of selection can be improved by taking advantage of HTP, especially in terms of improvements in selection intensity and the time required for one cycle of selection. Field-based HTP is useful for increasing selection intensity because of its scalability, while HTP in the greenhouse is good for reducing the time required for one cycle of selection because it is often performed on a single-plant basis and year-round. In the application of HTP in plant breeding, the factors described earlier should be taken into account to optimize selection methods for target traits.

2.3 Use of HTP for GWAS and GS

Although HTP alone is expected to improve the response to selection, response to selection can be further improved by using HTP in combination with genome-wide association studies (GWAS) and GS. HTP with UAV is particularly suited for this purpose, as it can measure a large number of small- to medium-sized plots in which plants are cultivated. HTP with UAV has been applied to the evaluation of a large number of genotypes (germplasm accessions, varieties, and breeding lines) in many species, including wheat [26, 40, 42, 55, 56], maize [31, 33], sorghum [32, 36, 38], and black poplar [50]. Condorelli et al. [42] performed GWAS with 248 elite durum wheat lines to compare the results obtained with two UAVs and a ground-based method to measure a VI (Normalized Difference Vegetation Index, NDVI). More associations were detected by HTP using UAV than with the ground-based method, suggesting an improved ability of HTP using UAV over the ground-based method. Spindel et al. [36] undertook GWAS with 648 diverse sorghum lines for 460 combinations of traits, treatments, time points, and locations. Four traits related to biomass, plant height, and leaf area were measured by HTP using UAV. In total, 213 high-quality, replicated, and conserved associations were detected in genomic intervals, including many strong candidate genes. Watanabe et al. [32] measured the height of 115 sorghum germplasm accessions with HTP using UAV and evaluated the potential of HTP to provide phenotypic training data in a GS model. Although phenotypic correlation was not high, GS of plant height as measured by HTP using UAV was highly correlated with those measured manually. These results suggest the considerable potential of HTP using UAV for genomic-assisted breeding through GWAS and GS.

To successfully combine HTP with GWAS or GS, a novel viewpoint different from the analysis of conventional phenotypic data is necessary. Since HTP enables non-destructive and frequent measurements for large-scale field tests, a target trait can be measured as high-density time series data and as high-density data with coordinate information. Thus, spatial-temporal continuity and change can be taken into account in GWAS and GS models. For instance, Elias et al. [57] fitted a model with a spatial kernel as well as a kernel-based genomic relationship matrix to cassava agronomic trait data to account for the spatial heterogeneity in the field and showed that the prediction accuracy increased after accounting for the spatial variation. Moreover, multiple sensors are commonly employed in HTP, each of which can acquire high-dimensional data (e.g., hyperspectral images). Thus, for GWAS and GS using phenotypic data collected by HTP, it is necessary to consider the high dimensionality of the data and the large number of data points. Spindel et al. [36] conducted a GWAS on a number of features collected with HTP using UAV and constructed a method and pipeline to fuse and organize numerous GWAS results. Phenotypic data measured by HTP can also be used in the prediction of genotypic values of a target trait by leveraging genetic correlations between the target trait and traits measured by HTP. Rutkoski et al. [56] proposed a method for predicting a target trait with correlated HTP traits, as described in the next section.

3 Integration of HTP Data into GS

3.1 Single-Trait Analysis

Recently, there have been several studies that have integrated genomic data and HTP data for prediction purposes in several crops using different modeling techniques [13, 56, 58,59,60,61,62,63]. The integration of genomic and HTP data provides opportunities to improve existing GS models, thus enabling breeders to select their material more accurately and increase genetic gain. We summarize some key methods developed for integrating high-throughput genomic and HTP information for the purpose of increasing the accuracy of prediction by extending the standard GS models.

We can include secondary image traits in a quantitative genetics model using two model parameterizations. The first model explains the ith phenotypic observation as the sum of an intercept μ common to all observations, a linear combination of p markers x _ij and their corresponding marker effects b _j, a linear combination of Q secondary traits s _iq and their corresponding effects a _q, and residual ɛ _i as follows:

$$ {y}_i=\mu +\sum \limits_{j=1}^p{x}_{ij}{b}_j+\sum \limits_{q=1}^Q{s}_{iq}{a}_q+{\varepsilon}_i. $$

The second model parameterization is based on covariance structures and can be obtained from the previous model by assuming that the effects of marker b _j and secondary traits a _q are independent and identically distributed draws from normal densities of the form $ {b}_j\sim N\left(0,{\sigma}_b^2\right) $ and $ {a}_q\sim N\left(0,{\sigma}_a^2\right) $. Then, $ {g}_i={\sum}_{j=1}^p{x}_{ij}{b}_j $ and $ {w}_i={\sum}_{q=1}^Q{s}_{iq}{a}_q $ are genetic and environmental values of the ith genotype using information from genomics and secondary traits. From properties of the multivariate normal density, the vectors of marker and secondary trait effects are also normally distributed, such as $ \mathbf{g}=\left\{{g}_i\right\}\sim N\left(\mathbf{0},\mathbf{G}{\sigma}_g^2\right) $ and $ \mathbf{w}=\left\{{w}_i\right\}\sim N\left(\mathbf{0},\mathbf{C}{\sigma}_A^2\right) $, where G = XX ′∕p is a covariance matrix whose entries describe genomic similarities between pairs of genotypes; X is the matrix of molecular markers of order n × p; $ {\sigma}_g^2=p\times {\sigma}_b^2 $; C = SS ′∕Q is a covariance matrix whose entries describe phenotypic similarities based on image secondary traits data for each pair of observations; S is the matrix of secondary traits of order n × Q; and $ {\sigma}_A^2=Q\times {\sigma}_a^2 $. This parameterization assumes that all of the secondary traits equally contribute to explain the phenotypic variations of the traits of interest. One of the advantages of using this second parameterization is that it is possible to evaluate the contribution of the genomic and HTP components for explaining phenotypic variability by comparing the estimated variance components associated with each of these terms.

The majority of models developed focus on predicting a single trait, namely, grain yield. HTP can measure traits that are shown to be highly correlated with grain yield, such as the spectral reflectance of the canopy and canopy temperature [64]. A VI is used to summarize the spectral reflectance of the canopy scores [61]. However, because the VI is calculated using only a subset of the available wavelengths, it does not take advantage of all of the HTP data. There are several approaches for incorporating all of the HTP wavelengths and the plot-level VI measurements into GS models. Rutkoski et al. [56] showed that the integration of VI and canopy temperature into a genomic best linear unbiased prediction (GBLUP) model could increase the prediction accuracy by 70% compared to that of a univariate baseline model in wheat data. Aguate et al. [65] showed that using bands as predictors increased prediction accuracy over that of VI. They used ordinary least squares, partial least squares, and a Bayesian shrinkage model to incorporate wavelengths into a GS model in maize. A similar observation was made by Montesinos et al. [66], who compared prediction model performance when all of the wavelengths were incorporated with that of a subset of the wavelengths in wheat. They concluded that using all of the wavelengths resulted in higher prediction accuracy.

3.2 Multi-Trait Analysis

Sun et al. [59] predicted grain yield in a two-step procedure in wheat data. First, they collected data on canopy temperature and VI as secondary traits (which are correlated with grain yield) and modeled the secondary traits using the genetic marker and environmental effects. They applied a mixed model for predicting grain yield without considering the secondary traits as covariates. However, they used the secondary traits to develop a multivariate model to predict grain yield, which is the primary trait. The secondary traits were measured in a longitudinal fashion, i.e., at several time points throughout the growing season. They implemented and compared the repeatability, multi-trait, and random regression (RR) models that can be used for modeling longitudinal data. In the second step of the GS, the results from the repeatability, multi-trait, and RR models were used as BLUP, and a univariate prediction model was compared to bivariate and multivariate models. Only grain yield was included, and the secondary traits were excluded in the univariate model. In one of the multivariate prediction models, the secondary traits were included both in the training and testing sets, and in the other multivariate prediction model the secondary traits were included only in the training set. The bivariate prediction model included grain yield and one of the secondary traits. Their results showed that the multivariate prediction model that incorporated the secondary traits in both the testing and training sets had an advantage over the other models in terms of prediction accuracy. However, it was not clear which of the first models (repeatability, multi-trait, or RR) performed the best because the results depended on the environmental conditions. Nonetheless, the results clearly demonstrated the advantage of using HTP data in GS applications.

Crain et al. [13] compared four models using wheat data: (1) a regular GS model, (2) a univariate model in which grain yield was the response and HTP data were predictors, (3) a model that was the combination of models 1 and 2, and (4) a multi-trait model that included grain yield, canopy temperature, and VI measurements. The results showed that the addition of HTP data increased the prediction accuracy. They found that the multi-trait model exhibited a 7% gain in terms of prediction accuracy, indicating that collecting multiple HTP measurements has the potential to increase genetic gain through the improvement of prediction models. Juliana et al. [62] applied multivariate prediction models to compare standard GS with a pedigree- and HTP-based prediction model. They discussed the situations in which each model can be useful and the importance of implementing the correct models in the correct stage of the breeding pipeline. The authors elaborated on the importance of the family structure and of the secondary HTP traits being highly correlated with the primary phenotypic trait, as these components are influencing factors in prediction performance.

3.3 Genotype by Environment Interaction

Although all of the studies described above considered approaches to integrate HTP into GS, they did not apply interaction effect models. However, there are multiple lines of evidence that GS models with interaction effects have the potential to outperform competing models with only additive effects [67,68,69]. Montesinos et al. [70] presented one of the first studies of HTP showing the impact of including the interaction between hyperspectral bands and environments (band × environment). These authors found that the model with the band × environment interaction outperformed all of the models without this interaction term. Jarquin et al. [63] used prediction models that incorporated line, environment, marker genotype, canopy coverage image information, and their interactions in soybean. They evaluated six main effects’ models that included combinations of line, environment, marker genotype, and canopy coverage image information; seven models with two-way interactions among the components; and two models with a three-way interaction between environment, marker genotype, and the canopy coverage data. Under the GBLUP model, they modeled the interaction components as the Hadamard product [71] of the relationship matrices obtained from genetic marker and canopy coverage image information according to the reaction norm model [9]. The model performance was evaluated using three cross-validation (CV) schemes: CV2, CV1, and CV0. CV2 assumed an incomplete field trial, in which some lines are observed in some environments but not in others. CV1 was the case in which one predicts the performance of a new line in environments in which some other lines were evaluated. The goal of CV0 was to predict the performance of already tested lines in untested environments. When grain yield was the target trait, the advantage of including the canopy coverage measurements and the interactions among marker, environment, and canopy coverage measurement was clearly shown. The highest predictive abilities for CV2 and CV1 were delivered by the models that included a three-way interaction among marker genotype, canopy coverage image data, and environment, while for CV0, the model with interactions between marker genotype and environment, and between canopy coverage image information and environment produced the greatest accuracy. The study also evaluated the effectiveness of canopy coverage image data from early stages and compared it with the case in which the canopy coverage image data was collected throughout the growing season. The results indicated that the information collected in the early stages was sufficient for prediction and that the additional data collected in the later stages did not improve the prediction models significantly. The practical implication of this finding is important, as it shows that the same prediction accuracy can be achieved using fewer resources (time, measurements, and costs).

Krause et al. [61] used multi-kernel, multi-environment GBLUP models including genetic marker or pedigree, environmental, and hyperspectral band information for predicting grain yield in wheat. They found that when marker genotype or pedigree data are not available, the main effects model using the hyperspectral band data provided a similar accuracy of prediction compared to the main effects models including marker or pedigree information. Additionally, the model with interactions outperformed the main effects models. Their findings differed from those of Jarquin et al. [63] with regard to the effectiveness of including partial HTP data. They concluded that the prediction accuracy increased when the HTP data from later stages were included. However, this difference is expected, as the crop development for wheat is significantly different from that for soybean. Finally, Montesinos et al. [70] and Montesinos et al. [72] showed the advantages of performing functional analysis for reducing data dimensionality to extract a higher signal-to-noise ratio for each observed value. In addition, Montesinos et al. [70] showed that when the HTP collected over multiple time points are combined using functional analysis, a small increase in prediction accuracy can be achieved relative to that of models that use data from a single time point.

4 Utilizing Image-Derived Longitudinal Traits for Genetic Studies in Plants

The observable phenotype at a given time is the culmination of numerous biological processes that have occurred prior to observation. For example, consider a cereal such as wheat at maturity. The total above-ground biomass can be separated hierarchically into a number of distinct organs. The whole plant can be partitioned into main and auxiliary tillers, which can be further partitioned into leaf blades, leaf sheaths, and stems. This process can proceed further to lower organization levels, separating these organs into tissues and cellular components. At each level, the pattern timing of development is tightly controlled by complex genetic networks that, at the organ level, control the onset of primordial development and initiation of growth and, at the plant level, the transition from vegetative to reproductive development.

An additional layer of complexity is added to this when the effect of the environment on these processes is considered. The appearance of the plant at maturity is certainly a product of its genetic makeup; however, the processes mentioned above are all tightly linked to the environment. The total size of the plant at maturity is a product of the resources (e.g., light, nutrients, and water) that were available throughout its life cycle. Plants need light and carbon dioxide to produce sugars through photosynthesis. Nutrients are combined with these sugars to generate nucleotides, proteins, and metabolites. Limitations on any of these inputs will slow or stunt growth. In addition to plant growth, the transition between developmental states is also linked directly to the environment. Several studies have shown that drought events can lead to earlier flowering and accelerated post-anthesis development (reviewed by Shavrukov et al. [73]). Therefore, the phenotype is not a static entity. The observable phenotype is the result of dynamic genetic processes, the changing external environment, and the dynamic interplay between the two.

For most genetic applications, plants are often phenotyped at only one or a few time points. These phenotypes are an incredibly informative summation of the processes that have occurred over the life cycle of the plant, and they have been used quite successfully to select for a variety of complex traits. While for many applications these single time point phenotypes may be sufficient, they fail to capture the dynamic processes that have led to the observable phenotype. In most genetic studies, phenotypic evaluation is the largest, most time-consuming activity. Typical genetic studies consist of a mapping population with hundreds to thousands of individuals that are grown in replicates. Thus, for these studies, phenotyping at one time point is often a huge commitment, while evaluation at multiple time points is often unfeasible.

In the last decade, the construction and accessibility of high-throughput phenotyping platforms have provided an attractive means for generating phenotypic data throughout the duration of a study in a non-destructive manner for a number of economically important crop species [14, 32, 37, 74]. These platforms have been successfully deployed in controlled environments to quantify growth and physiological processes in response to drought and salinity [75, 76]. Moreover, with the growing popularity of UAVs in the consumer market, a vast selection of hardware can be obtained at relatively low cost [32]. These can be outfitted with various sensors or cameras and deployed routinely in the field to capture trait development over the growing seasons. In crop species, these temporal phenotypes have been used as covariates in genomic prediction frameworks to improve prediction accuracy for end-point phenotypes, such as yield [13, 59, 63]. However, analysis of the longitudinal trait itself has been largely confined to genetic inference in crops species, while genomic prediction has been applied largely to perennial species [77,78,79]. In the following section, we describe several approaches for genomic prediction of the longitudinal phenotype itself.

4.1 Single Time Point Genetic Inference

A seemingly straightforward approach for assessing dynamic genetic effects underlying longitudinal traits is performing linkage or association analysis at each time point independently [80,81,82,83]. In one of the first applications of HTP for genetic studies in plants, Moore et al. [83] used an image-based platform to quantify root gravitropic responses in Arabidopsis biparental mapping populations. The authors used a step-wise mapping approach at each time point to identify time-dependent quantitative trait loci (QTL) and used a post hoc approach to combine information on QTL detected across multiple time points. The post hoc approach effectively used two metrics to classify QTL into a persistent class, by averaging the LOD scores across time points, and transient QTL, by taking the maximum LOD across all time points. While this post hoc approach effectively combines statistics across time points and successfully classifies the temporal genetics of root gravitropism, the single time point mapping approach itself does not explicitly model the covariance across time points. Thus, the actual genetic inference step does not fully capture the phenotypic trajectories.

4.2 Functional Mapping

Several other approaches have been proposed that directly consider the trait trajectories for genetic analyses. With these approaches, the trait values across all time points can be modeled using parametric or non-parametric mathematical functions. These models describe the phenotypic trajectories using a few parameters (for a review of parametric models in the context of plant growth, see Paine et al. [84]). Once an appropriate model has been chosen, genetic inference or prediction can proceed using a single-step or two-step approach.

4.2.1 Single-Step Functional Mapping

In the single-step functional mapping approach, model fitting and genetic analyses are performed within a single statistical framework. In the plant community, the single-step approach for functional genetic inference/mapping was first proposed by Ma et al. [85] to map QTL for stem diameter in Populus. Since then, the functional mapping approach has been applied to longitudinal traits in other species, such as humans and mice, and has been extended into the mixed model framework used for GWAS [86,87,88,89,90]. The advantages of the single-step functional mapping approach are that it considers the full trait trajectories over time, yielding loci that influence the curve itself, and captures the covariance across time points, which should reduce residual variance and improve statistical power [88]. Essentially, at each locus, the single-step functional mapping approach models the mean trajectories for each genotype and tests whether the time-dependent genetic effects are non-zero.

There are two important considerations for the single-step functional mapping approach: (1) the choice of function to model the mean trajectories of each genotype, and (2) the appropriate residual covariance structure to account for the temporal nature of the data. The function to model the mean trajectories can be parametric or non-parametric and can be selected based on some prior knowledge of the phenotypic trajectories. For well-studied traits, such as growth, a number of parametric options exist, are biologically meaningful and can be easily applied to the longitudinal data set [84]. In cases in which no prior knowledge exists about the phenotypic trajectories, a nonparametric function, such as orthogonal Legendre polynomials or B-spline functions, can be utilized. The nonparametric functions are described in greater detail below. A number of covariance structures can be used to account for the temporal relationships among observations. The choice will be dependent on the balance between statistical efficiency and robustness. In the most robust case, the unstructured covariance matrix, the variance and covariance at each time point are unique and estimated from the data. While this places no constraints on the variance–covariances, the number of parameters that must be estimated can be prohibitively large for most studies. In many cases, simpler structures may be nearly as robust while estimating far fewer parameters.

4.2.2 Two-Step Functional Mapping

In contrast to the single-step functional mapping approach, the two-step approach performs the model fitting and genetic analysis in two separate steps. First, the phenotypic trajectories are modeled for each individual, and the model parameters are used as derived traits for subsequent genetic analyses (e.g., GWAS, linkage analysis, or GS). This two-step approach has been successfully used to examine the genetic basis of rosette growth in Arabidopsis and for GWAS and GS of early vigor in rice [91, 92]. The advantages of this approach are that it is conceptually simple and easy to implement. Moreover, for most popular growth models, the parameters have biological meaning. For instance, growth can be modeled over the life cycle of the plant using a 3-parameter logistic function. Here, the inflection point can be calculated, which represents the transition from vegetative to reproductive growth. Thus, the researcher can select a specific attribute to study and select a specific model parameter that represents that attribute for analysis. Moreover, outside of genetic mapping, these parameters may provide biological insight into a plant’s phenotypic development. For instance, Campbell et al. [92] targeted a specific model parameter that described a plant’s growth rate and showed that the plant hormone gibberellic acid may influence natural variation for the rate-controlling parameter. However, the major disadvantage of this method is that information is lost between the functional modeling and genetic analysis steps. Since environmental factors are not included in the functional modeling step, the residuals likely contain important information regarding non-genetic components of the phenotypic variance for the longitudinal phenotype.

4.3 Insights from Animal Breeding for Genomic Prediction Using Longitudinal Traits

While the use of longitudinal phenotypes is relatively new in plant science, animal breeders have targeted longitudinal traits for decades [93]. In animal breeding, breeders are often interested in the development of a trait across an animal’s life. For instance, in dairy cattle, test-day milk yields are collected routinely. Moreover, other traits, such as feed intake, growth, and egg production [94,95,96,97], have also been examined in a longitudinal framework. With the extensive use of these traits in animal breeding, numerous frameworks have been well developed to accommodate the time axis and have been used extensively for inference on genetic and environmental variance components, as well as pedigree and GS.

In the following subsection, we discuss several approaches that have been used for pedigree- or genomic-based prediction in animal breeding in a context that is applicable to plant breeding with HTP platforms. As mentioned above, a naive approach for GS using longitudinal data would be a univariate approach, in which a conventional mixed model is fitted at each time point. Here, we introduce the concept of longitudinal GS from a multivariate framework, as this is a relatively simple extension of the univariate approach, and extend these concepts to covariance functions and RR models that have been pioneered in animal breeding.

4.4 Multivariate Approaches for Longitudinal Genomic Prediction

To capture the covariance between time points, a logical progression from the univariate approach is to utilize a multivariate framework for longitudinal data. Thus, rather than considering the longitudinal trait as a consecutive series of measurements on the same trait, with the multivariate approach, we essentially ignore the order of the series and treat each time point as a separate trait. The multivariate framework allows each time point to have a unique variance and unique covariances between time points. The multivariate GS framework is well developed and has been widely utilized in both plant and animal systems. Moreover, the extension from the univariate approach is relatively straightforward.

Assume a simple case in which we are given three consecutive measurements for each individual. The model for each trait can be written as

$$ {\mathbf{y}}_1\kern0.5em ={\mathbf{X}}_1{\mathbf{b}}_1+{\mathbf{Z}}_1{\mathbf{u}}_1+{\boldsymbol{\varepsilon}}_1\kern0.5em $$

(1)

$$ {\mathbf{y}}_2\kern0.5em ={\mathbf{X}}_2{\mathbf{b}}_2+{\mathbf{Z}}_2{\mathbf{u}}_2+{\boldsymbol{\varepsilon}}_2\kern0.5em $$

(2)

$$ {\mathbf{y}}_3\kern0.5em ={\mathbf{X}}_3{\mathbf{b}}_3+{\mathbf{Z}}_3{\mathbf{u}}_3+{\boldsymbol{\varepsilon}}_3\kern0.5em $$

(3)

where y _i is the vector of observations for trait i; X _i and Z _i are the incidence matrices for fixed effects and random effects, respectively, for trait i; u _i is the vector of random genetic effects for trait i; and ɛ _i is the vector of residuals for trait i. Thus, the multivariate model is

$$ \left[\begin{array}{l}\hfill {\mathbf{y}}_1\hfill \\ {}\hfill {\mathbf{y}}_2\hfill \\ {}\hfill {\mathbf{y}}_3\hfill \\ {}\hfill \hfill \end{array}\right]\kern0.5em =\left[\begin{array}{lll}\hfill {\mathbf{X}}_1\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {\mathbf{X}}_2\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill {\mathbf{X}}_3\hfill \\ {}\hfill \hfill \end{array}\right]\left[\begin{array}{l}\hfill {\mathbf{b}}_1\hfill \\ {}\hfill {\mathbf{b}}_2\hfill \\ {}\hfill {\mathbf{b}}_3\hfill \\ {}\hfill \hfill \end{array}\right]+\left[\begin{array}{lll}\hfill {\mathbf{Z}}_1\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill {\mathbf{Z}}_2\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill {\mathbf{Z}}_3\hfill \\ {}\hfill \hfill \end{array}\right]\left[\begin{array}{l}\hfill {\mathbf{u}}_1\hfill \\ {}\hfill {\mathbf{u}}_2\hfill \\ {}\hfill {\mathbf{u}}_3\hfill \\ {}\hfill \hfill \end{array}\right]+\left[\begin{array}{l}\hfill {\boldsymbol{\varepsilon}}_1\hfill \\ {}\hfill {\boldsymbol{\varepsilon}}_2\hfill \\ {}\hfill {\boldsymbol{\varepsilon}}_3\hfill \\ {}\hfill \hfill \end{array}\right]\kern0.5em $$

(4)

Moreover, as mentioned above, we assume unique variances and covariances for each trait/time point.

$$ \operatorname{var}\left[\begin{array}{l}{\mathbf{u}}_1\\ {}\hfill {\mathbf{u}}_2\hfill \\ {}\hfill {\mathbf{u}}_3\hfill \\ {}\hfill {\boldsymbol{\varepsilon}}_1\hfill \\ {}\hfill {\boldsymbol{\varepsilon}}_2\hfill \\ {}\hfill {\boldsymbol{\varepsilon}}_3\hfill \\ {}\hfill \hfill \end{array}\right]\kern0.5em =\left[\begin{array}{llllll}\hfill \mathbf{G}{\sigma_g^2}_{11}\hfill & \hfill \mathbf{G}{\sigma_g^2}_{12}\hfill & \hfill \mathbf{G}{\sigma_g^2}_{13}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill \mathbf{G}{\sigma_g^2}_{21}\hfill & \hfill \mathbf{G}{\sigma_g^2}_{22}\hfill & \hfill \mathbf{G}{\sigma_g^2}_{23}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill \mathbf{G}{\sigma_g^2}_{31}\hfill & \hfill \mathbf{G}{\sigma_g^2}_{32}\hfill & \hfill \mathbf{G}{\sigma_g^2}_{33}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{11}\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{12}\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{13}\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{21}\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{22}\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{23}\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{31}\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{32}\hfill & \hfill \mathbf{I}{\sigma_{\varepsilon}^2}_{33}\hfill \\ {}\hfill \hfill \end{array}\right]\kern0.5em $$

(5)

Thus, for this simple case, we are capturing the full covariance across the three time points and leveraging this covariance to predict unique genetic values at each. However, notice the dimensions of the covariances $ {\sigma}_g^2 $ and $ {\sigma}_{\varepsilon}^2 $. Here, we must solve for 12 parameters. If we have a very large population, this may not be an issue. However, if we consider a more realistic data set from HTP, it is likely that we will have many more time points. Thus, for t time points, we will need to estimate t variances and t(t − 1)∕2 covariances for both the genetic effects and residuals. For most HTP studies, this will create unnecessary computational demands. Moreover, additional challenges could be experienced if the parameter estimates are near the bounds, which may yield inaccurate estimates of variance components. Thus, when faced with larger longitudinal data sets (t > 5), the researcher should question whether it is necessary to estimate each covariance. In cases in which the measurements are taken at short intervals within a given developmental period, it is likely safe to assume that the genetic variances between adjacent time points will be very similar. Therefore, a much simpler model may still capture much of the covariance while estimating fewer parameters. This is discussed in detail below. For other cases in which fewer measurements are recorded over more widely spaced intervals, the previous assumption may not hold true, and the full, unstructured matrix used in the multi-trait framework may be a more accurate model.

4.5 Covariance Functions and Random Regression Models for Longitudinal Genetic Prediction

In the multi-trait framework, we treat the longitudinal phenotype, say growth, as a collection of independent traits; as a result, we are limited to making predictions at time points with records. However, in most longitudinal studies, we are interested in learning about the development of a continuous trait over time and do so by taking measurements at discrete time points. The time points or intervals themselves may be selected somewhat arbitrarily, and we seek to fill in information between time points. Thus, to capture the full trajectory of trait development, we can separate the trajectory into infinitely smaller intervals. Therefore, if we view the longitudinal trait as an “infinite-dimensional” trait, we can see that the multivariate framework is inadequate, in that it does not directly consider the time axis and it does not allow us to make predictions at time points without observations.

Kirkpatrick et al. [98] initially proposed the use of covariance functions (CFs) for the analysis of “infinite-dimensional” traits. A CF is simply the infinite-dimensional equivalent of a covariance matrix for a given number of time points. Using this approach, the covariance between any two records measured at given time points can be obtained using only the time points and some coefficients. For an “infinite-dimensional” trait, there can be an infinite number of coefficients; however, in practice, the number of coefficients is dependent on the number of time points with records, with the maximum number of coefficients being t(t + 1)∕2.

Following the example described in the multi-trait section above, we provide a brief example of the CF approach. Assume we have a trait measured at three time points using the covariance matrix in Kirkpatrick et al. [98]. Using the multi-trait approach, we estimate the 3 × 3 additive genetic covariance matrix ($ \hat{\boldsymbol{\Sigma}} $) and estimate the variances and covariances at each of the three time points. The goal of the approach described by Kirkpatrick et al. [98] is to represent the additive generic covariance matrix ($ \hat{\boldsymbol{\Sigma}} $) as a continuous covariance function ($ \mathcal{K} $) given data collected at discrete time points. Although a number of methods can be used to estimate $ \mathcal{K} $ from $ \hat{\boldsymbol{\Sigma}} $, orthogonal polynomials are used most often due to the low correlations among the estimated coefficients [99].

Given a covariance function with a full rank fit (e.g., order of polynomials is equal to the number of time points, k = t), Kirkpatrick et al. [98] showed that the observed covariance matrix $ \hat{\boldsymbol{\Sigma}} $ can be expressed as $ \hat{\boldsymbol{\Sigma}}=\boldsymbol{\Phi} \mathbf{K}\boldsymbol{\Phi}^{\prime } $, where K is a coefficient matrix associated with the CF, and Φ is a matrix of Legendre polynomials of order t by k, the order of Legendre polynomials (in this case k = t). Φ is defined by the Legendre polynomial functions via Φ = M Λ. With Legendre polynomials, the time points are standardized so that they span an interval of -1 to 1, and here, M is a matrix of the polynomials of standardized time points. Λ is a matrix of coefficients of Legendre polynomials of order k × k. The first two Legendre polynomials are P ₀(t) = 1 and P ₁(t) = t, and the subsequent jth Legendre polynomials are given by $ {P}_{j+1}(t)=\frac{1}{j+1}\left(2j+1\right)t{P}_j(t)-j{P}_{j-1}(t) $. These can be normalized to ϕ _j via $ {\phi}_j=\frac{\sqrt{\left(2j+1\right)}}{2}{P}_j(t) $. Thus, the first three normalized Legendre polynomials will be P ₀(t) = 0.707, P ₁(t) = 1.2247t, and P ₂(t) = −0.7906 + 2.3717t ². Thus, Λ is

$$ \boldsymbol{\Lambda} \kern0.5em =\left[\begin{array}{lll}\hfill 0.7071\hfill & \hfill 0\hfill & \hfill -0.7906\hfill \\ {}\hfill 0\hfill & \hfill 1.2247\hfill & \hfill 0\hfill \\ {}\hfill 0\hfill & \hfill 0\hfill & \hfill 2.3717\hfill \end{array}\right]\kern0.5em $$

(6)

It is of particular importance to note that Φ is not dependent on the values nor the time points in the data set; only K is. Thus, given the 3 × 3 covariance matrix, $ \hat{\boldsymbol{\Sigma}} $, the covariance between any two time points, can be obtained using $ \mathcal{K}\left({a}_1,{a}_2\right)={\sum}_{i=0}^{\infty }{\sum}_{j=0}^{\infty }{\mathbf{K}}_{ij}{\phi}_i\left({a}_1\right){\phi}_j\left({a}_2\right) $, and the breeding value at any time point can be obtained using $ {\hat{g}}_t={\sum}_{i=0}^{k-1}{\phi}_i\left({d}_t\right){u}_i $. Moreover, with a full rank fit, the covariance matrix obtained is equivalent to that obtained using the multivariate approach in the previous section.

In most cases, the full covariance matrix $ \hat{\boldsymbol{\Sigma}} $ is unknown; therefore, it must be estimated directly from the data. As shown by Meyer and Hill [100], this can be done by a reparameterization of the multivariate or “finite-dimensional” approach. However, in many studies, particularly those focused on the analysis of longitudinal milk production in dairy cattle, RR models (e.g., test-day models) are most commonly used. The RR approach proposed by Schaeffer [93] regresses the phenotypic trait trajectories directly on Φ to estimate K. As demonstrated by Meyer and Hill [100], both the CF and RR approaches are equivalent. The general form of the RR model is

$$ {y}_{tij}\kern0.5em =F{E}_i+\sum \limits_{k=0}^{nf}{\phi}_{jtk}{\beta}_k+\sum \limits_{k=0}^{nr}{\phi}_{jtk}{u}_{jk}+\sum \limits_{k=0}^{nr}{\phi}_{jtk}p{e}_{jk}+{\varepsilon}_{tij}\kern0.5em $$

(7)

Here, FE _i is the fixed effect for the ith group; ϕ _jtk is the kth Legendre polynomial for individual j at time t; β _k is the fixed regression coefficient for the kth Legendre polynomial, which represents the overall mean trait trajectory for the population or group; u _jk is the genetic value for the kth Legendre polynomial for the jth individual; and pe _jk is the permanent environmental effect for the kth Legendre polynomial for the jth individual. This permanent environmental effect is a stable, perpetual, non-genetic effect that influences an individual’s trait trajectory. It is assumed to be common to all repeated observations on the same individual. Thus, e can be considered temporary environmental effects. In matrix form, the RR model can be written as y = Xb + Za + Qp + ɛ.

In the examples above, we used a full-order polynomial to model the covariance across time points. As in the multivariate example, this requires estimation of a large number of parameters and in most cases is computationally unfeasible and could lead to convergence problems or inaccurate parameter estimates. In most cases, it is much more advantageous to fit a simpler model using a reduced-order polynomial (k < t). This effectively allows fewer parameters to be estimated while still adequately describing $ \hat{\boldsymbol{\Sigma}} $. Generally speaking, the goodness of fit will increase as the number of function parameters describing the curve increases [101]. Campbell et al. [102] used RR models for rice shoot growth trajectories and demonstrated that the model could be used for longitudinal genomic prediction. Baba et al. [103] showed the utility of a multi-trait RR model for genomic prediction of daily water usage in rice through joint modeling with shoot biomass.

5 Conclusions

This chapter described statistical methods for analyzing large-scale HTP data in quantitative genetics. We contend that the integration of HTP data into quantitative genetics models triggers a great leap forward in plant breeding. In particular, we discussed (1) the genetic gain that can be achieved using HTP data, (2) the use of HTP data as predictive covariates in GS models, and (3) the modeling of temporal HTP data using RR models. In GS, it is known that the accuracy of genomic prediction, and thus the response to selection, decreases as the selection cycle advances [104, 105]. To maintain the response to selection, it is necessary to update the model on a regular basis [105,106,107]. In order to update the model, it is necessary to conduct a field test to measure phenotypes and to obtain genome-wide markers for many genotypes. At this step, phenotypic measurement for model updating may become a serious bottleneck of GS breeding. Thus, it is important to utilize HTP, which can evaluate many genotypes and possibly shorten the time required for selection.

High-throughput phenotyping and phenomics offer numerous opportunities to understand plant development, the genetics of quantitative traits such as yield, and their connection to the environment. The utilization of HTP data that are correlated with traits of interest can change how breeders select their material for advancement. Incorporating HTP data into prediction models has the potential to increase prediction accuracy, thus enabling plant breeders to select and discard more accurately. Although the reviewed studies considered different models, they concluded that regardless of the model configuration, the inclusion of HTP data increased the prediction performance when it was combined with different data types (marker genotype, pedigree, and environment). Additional gains can be expected when considering interactions with environmental factors.

The RR approach offers several advantages compared to the multivariate approach. As mentioned above, the RR approach allows environmental effects to be partitioned into permanent and temporary environmental effects. Moreover, the RR approach models the individual-specific deviations from the mean phenotypic trajectories of the population. This allows the shape, amplitude, and intercept of the phenotypic trajectories to be unique for each individual and assumes that the genetic and permanent environmental effects are not constant throughout trait development. Thus, the RR model should more accurately reflect the biological processes that give rise to the phenotype. Furthermore, RR models offer a robust framework for fitting reduced-fit covariance functions. This offers a computational advantage over the multivariate approach in that it allows the model to converge more quickly. Moreover, by only estimating the parameters that are necessary to describe the data, sampling errors can be minimized. Finally, the RR approach provides a robust framework that allows the researcher to study how genetic variability changes over time and enables selection of individuals to alter phenotypic trajectories over time.

References

Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al (2011) A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS One 6(5):e19379
Article CAS PubMed PubMed Central Google Scholar
Meuwissen T, Hayes B, Goddard M (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829
Article CAS PubMed PubMed Central Google Scholar
De Los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, et al (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182(1):375–385
Article CAS Google Scholar
Crossa J, Campos Gdl, Pérez P, Gianola D, Burgueño J, Araus JL, et al (2010) Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers. Genetics 186(2):713–724
Article CAS PubMed PubMed Central Google Scholar
Aguilar I, Misztal I, Johnson D, Legarra A, Tsuruta S, Lawlor T (2010) Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci 93(2):743–752
Article CAS PubMed Google Scholar
Christensen OF, Lund MS (2010) Genomic prediction when some animals are not genotyped. Genet Sel Evol 42(1):2
Article PubMed PubMed Central Google Scholar
Burgueño J, de los Campos G, Weigel K, Crossa J (2012) Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci 52(2):707–719
Google Scholar
Heslot N, Akdemir D, Sorrells ME, Jannink JL (2014) Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions. TheorAppl Genet 127(2):463–480
Article Google Scholar
Jarquín D, Crossa J, Lacaze X, Du Cheyron P, Daucourt J, Lorgeou J, et al (2014) A reaction norm model for genomic selection using high-dimensional genomic and environmental data. Theor Appl Genet 127(3):595–607
Article PubMed Google Scholar
Pérez-Rodríguez P, Crossa J, Rutkoski J, Poland J, Ravi S, Legarra A, et al (2017) Single-step genomic and pedigree genotype × environment interaction models for predicting wheat lines in international environments. Plant Genome 10(2)
Google Scholar
White JW, Andrade-Sanchez P, Gore MA, Bronson KF, Coffelt TA, Conley MM, et al (2012) Field-based phenomics for plant genetics research. Field Crops Res 133:101–112
Google Scholar
Cobb JN, DeClerck G, Greenberg A, Clark R, McCouch S (2013) Next-generation phenotyping: requirements and strategies for enhancing our understanding of genotype–phenotype relationships and its relevance to crop improvement. Theor Appl Genetics 126(4):867–887
Article Google Scholar
Crain J, Mondal S, Rutkoski J, Singh RP, Poland J (2018) Combining high-throughput phenotyping and genomic information to increase prediction and selection accuracy in wheat breeding. Plant Genome 11(1):170043
Article Google Scholar
Furbank RT, Tester M (2011) Phenomics–technologies to relieve the phenotyping bottleneck. Trends Plant Sci 16(12):635–644
Article CAS PubMed Google Scholar
Reynolds D, Baret F, Welcker C, Bostrom A, Ball J, Cellini F, et al (2019) What is cost-efficient phenotyping? Optimizing costs for different scenarios. Plant Sci 282:14–22
Article CAS PubMed Google Scholar
Fiorani F, Schurr U (2013) Future scenarios for plant phenotyping. Ann Rev Plant Biol 64:267–291
Article CAS Google Scholar
Araus JL, Kefauver SC, Zaman-Allah M, Olsen MS, Cairns JE (2018) Translating high-throughput phenotyping into genetic gain. Trends Plant Sci 23(5):451–466
Article CAS PubMed PubMed Central Google Scholar
Fahlgren N, Gehan MA, Baxter I (2015) Lights, camera, action: high-throughput plant phenotyping is ready for a close-up. Curr Opin Plant Biol 24:93–99
Article PubMed Google Scholar
Liebisch F, Kirchgessner N, Schneider D, Walter A, Hund A (2015) Remote, aerial phenotyping of maize traits with a mobile multi-sensor approach. Plant Methods 11(1):9
Article PubMed PubMed Central Google Scholar
White JW, Conley MM (2013) A flexible, low-cost cart for proximal sensing. Crop Sci 53(4):1646–1649
Article Google Scholar
Andrade-Sanchez P, Gore MA, Heun JT, Thorp KR, Carmo-Silva AE, French AN, et al (2014) Development and evaluation of a field-based high-throughput phenotyping platform. Funct Plant Biol 41(1):68–79
Article Google Scholar
Sankaran S, Khot LR, Espinoza CZ, Jarolmasjed S, Sathuvalli VR, Vandemark GJ, et al (2015) Low-altitude, high-resolution aerial imaging systems for row and field crop phenotyping: a review. Eur J Agron 70:112–123
Article Google Scholar
Yang G, Liu J, Zhao C, Li Z, Huang Y, Yu H, et al (2017) Unmanned aerial vehicle remote sensing for field-based crop phenotyping: current status and perspectives. Front Plant Sci 8:1111
Article PubMed PubMed Central Google Scholar
Tattaris M, Reynolds MP, Chapman SC (2016) A direct comparison of remote sensing approaches for high-throughput phenotyping in plant breeding. Front Plant Sci 7:1131
Article PubMed PubMed Central Google Scholar
Kefauver SC, Vicente R, Vergara-Díaz O, Fernandez-Gallego JA, Kerfal S, Lopez A, et al (2017) Comparative UAV and field phenotyping to assess yield and nitrogen use efficiency in hybrid and conventional barley. Front Plant Sci 8:1733
Article PubMed PubMed Central Google Scholar
Madec S, Baret F, De Solan B, Thomas S, Dutartre D, Jezequel S, et al (2017) High-throughput phenotyping of plant height: comparing unmanned aerial vehicles and ground LiDAR estimates. Front Plant Sci 8:2002
Article PubMed PubMed Central Google Scholar
Yu K, Anderegg J, Mikaberidze A, Karisto P, Mascher F, McDonald BA, et al (2018) Hyperspectral canopy sensing of wheat Septoria tritici blotch disease. Front Plant Sci 9:1195
Article PubMed PubMed Central Google Scholar
Yuan W, Li J, Bhatta M, Shi Y, Baenziger P, Ge Y (2018) Wheat height estimation using LiDAR in comparison to ultrasonic sensor and UAS. Sensors 18(11):3731
Article PubMed Central Google Scholar
Sun S, Li C, Paterson AH, Jiang Y, Xu R, Robertson JS, et al (2018) In-field high throughput phenotyping and cotton plant growth analysis using LiDAR. Front Plant Sci 9:16
Article PubMed PubMed Central Google Scholar
Wang X, Singh D, Marla S, Morris G, Poland J (2018) Field-based high-throughput phenotyping of plant height in sorghum using different sensing technologies. Plant Methods 14(1):53
Article PubMed PubMed Central Google Scholar
Shi Y, Thomasson JA, Murray SC, Pugh NA, Rooney WL, Shafian S, et al (2016) Unmanned aerial vehicles for high-throughput phenotyping and agronomic research. PloS One 11(7):e0159781
Article PubMed PubMed Central CAS Google Scholar
Watanabe K, Guo W, Arai K, Takanashi H, Kajiya-Kanegae H, Kobayashi M, et al (2017) High-throughput phenotyping of sorghum plant height using an unmanned aerial vehicle and its application to genomic prediction modeling. Front Plant Sci 8:421
Article PubMed PubMed Central Google Scholar
Han L, Yang G, Yang H, Xu B, Li Z, Yang X (2018) Clustering field-based maize phenotyping of plant-height growth and canopy spectral dynamics using a UAV remote-sensing approach. Front Plant Sci 9:1638
Article PubMed PubMed Central Google Scholar
Li J, Shi Y, Veeranampalayam-Sivakumar AN, Schachtman DP (2018) Elucidating sorghum biomass, nitrogen and chlorophyll contents with spectral and morphological traits derived from unmanned aircraft system. Front Plant Sci 9:1406.
Article PubMed PubMed Central Google Scholar
Pugh N, Horne DW, Murray SC, Carvalho G, Malambo L, Jung J, et al (2018) Temporal estimates of crop growth in sorghum and maize breeding enabled by unmanned aerial systems. Plant Phenome J 1(1):1–10
Article Google Scholar
Spindel JE, Dahlberg J, Colgan M, Hollingsworth J, Sievert J, Staggenborg SH, et al (2018) Association mapping by aerial drone reveals 213 genetic associations for Sorghum bicolor biomass traits under drought. BMC Genomics 19(1):679
Article PubMed PubMed Central CAS Google Scholar
Chapman SC, Merz T, Chan A, Jackway P, Hrabar S, Dreccer MF, et al (2014) Pheno-Copter: a low-altitude, autonomous remote-sensing robotic helicopter for high-throughput field-based phenotyping. Agronomy 4(2):279–301
Article Google Scholar
Guo W, Zheng B, Potgieter AB, Diot J, Watanabe K, Noshita K, et al (2018) Aerial imagery analysis–quantifying appearance and number of sorghum heads for applications in breeding and agronomy. Front Plant Sci 9:1544
Article PubMed PubMed Central Google Scholar
Xu R, Li C, Paterson AH, Jiang Y, Sun S, Robertson JS (2018) Aerial images and convolutional neural network for cotton bloom detection. Front Plant Sci 8:2235
Article PubMed PubMed Central Google Scholar
Haghighattalab A, Pérez LG, Mondal S, Singh D, Schinstock D, Rutkoski J, et al (2016) Application of unmanned aerial systems for high throughput phenotyping of large wheat breeding nurseries. Plant Methods 12(1):35
Article PubMed PubMed Central CAS Google Scholar
Zaman-Allah M, Vergara O, Araus J, Tarekegne A, Magorokosho C, Zarco-Tejada P, et al (2015) Unmanned aerial platform-based multi-spectral imaging for field phenotyping of maize. Plant Methods 11(1):35
Article CAS PubMed PubMed Central Google Scholar
Condorelli GE, Maccaferri M, Newcomb M, Andrade-Sanchez P, White JW, French AN, et al (2018) Comparative aerial and ground based high throughput phenotyping for the genetic dissection of NDVI as a proxy for drought adaptive traits in durum wheat. Front Plant Sci 9:893
Article PubMed PubMed Central Google Scholar
Potgieter AB, George-Jaeggli B, Chapman SC, Laws K, Suárez Cadavid LA, Wixted J, et al (2017) Multi-spectral imaging from an unmanned aerial vehicle enables the assessment of seasonal leaf area dynamics of sorghum breeding lines. Front Plant Sci 8:1532
Article PubMed PubMed Central Google Scholar
Kyratzis AC, Skarlatos DP, Menexes GC, Vamvakousis VF, Katsiotis A (2017) Assessment of vegetation indices derived by UAV imagery for durum wheat phenotyping under a water limited and heat stressed Mediterranean environment. Front Plant Sci 8:1114
Article PubMed PubMed Central Google Scholar
Hassan MA, Yang M, Rasheed A, Jin X, Xia X, Xiao Y, et al (2018) Time-series multispectral indices from unmanned aerial vehicle imagery reveal senescence rate in bread wheat. Remote Sensing 10(6):809
Article Google Scholar
Vergara-Díaz O, Zaman-Allah MA, Masuka B, Hornero A, Zarco-Tejada P, Prasanna BM, et al (2016) A novel remote sensing approach for prediction of maize yield under different conditions of nitrogen fertilization. Front Plant Sci 7:666
Article PubMed PubMed Central Google Scholar
Jay S, Baret F, Dutartre D, Malatesta G, Héno S, Comar A, et al (2019) Exploiting the centimeter resolution of UAV multispectral imagery to improve remote-sensing estimates of canopy structure and biochemistry in sugar beet crops. Remote Sens Environ 231:110898
Article Google Scholar
Verger A, Vigneau N, Chéron C, Gilliot JM, Comar A, Baret F (2014) Green area index from an unmanned aerial system over wheat and rapeseed crops. Remote Sens Environ 152:654–664
Article Google Scholar
Khan Z, Rahimi-Eichi V, Haefele S, Garnett T, Miklavcic SJ (2018) Estimation of vegetation indices for high-throughput phenotyping of wheat using aerial imaging. Plant Methods 14(1):20
Article PubMed PubMed Central Google Scholar
Ludovisi R, Tauro F, Salvati R, Khoury S, Mugnozza Scarascia G, Harfouche A (2017) UAV-based thermal imaging for high-throughput field phenotyping of black poplar response to drought. Front Plant Sci 8:1681
Article PubMed PubMed Central Google Scholar
Acquaah G (2009) Principles of plant genetics and breeding. Wiley, New York
Google Scholar
Thorp K, Thompson A, Harders S, French A, Ward R (2018) High-throughput phenotyping of crop water use efficiency via multispectral drone imagery and a daily soil water balance model. Remote Sens 10(11):1682
Article Google Scholar
Allen RG, Pereira LS, Raes D, Smith M, et al (1998) Crop evapotranspiration: guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56. Fao, Rome. 300(9):D05109
Google Scholar
Falconer DS, Mackay T (1996) Introduction to quantitative genetics. Oliver And Boyd, Edinburgh, London
Google Scholar
Sankaran S, Khot LR, Carter AH (2015) Field-based crop phenotyping: Multispectral aerial imaging for evaluation of winter wheat emergence and spring stand. Comput Electron Agricult 118:372–379
Article Google Scholar
Rutkoski J, Poland J, Mondal S, Autrique E, Párez LG, Crossa J, et al (2016) Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3 Genes Genomes Genetics 6:g3–116.
Google Scholar
Elias AA, Rabbi I, Kulakow P, Jannink JL (2018) Improving genomic prediction in cassava field experiments using spatial analysis. G3 Genes Genomes Genetics 8(1):53–62
PubMed Google Scholar
Xavier A, Hall B, Hearst AA, Cherkauer KA, Rainey KM (2017) Genetic architecture of phenomic-enabled canopy coverage in Glycine max. Genetics 206:p. genetics–116
Google Scholar
Sun J, Rutkoski JE, Poland JA, Crossa J, Jannink JL, Sorrells ME (2017) Multitrait, random regression, or simple repeatability model in high-throughput phenotyping data improve genomic prediction for wheat grain yield. Plant Genome 10(2):plantgenome2016–11
Google Scholar
Cabrera-Bosquet L, Crossa J, von Zitzewitz J, Serret MD, Luis Araus J (2012) High-throughput phenotyping and genomic selection: the frontiers of crop breeding converge F. J Integr Plant Biol. 54(5):312–320
Article PubMed Google Scholar
Krause MR, González-Pérez L, Crossa J, Pérez-Rodríguez P, Montesinos-López O, Singh RP, et al (2019) Hyperspectral reflectance-derived relationship matrices for genomic prediction of grain yield in wheat. G3 Genes Genomes Genetics 9(4):1231–1247
PubMed PubMed Central Google Scholar
Juliana P, Montesinos-López OA, Crossa J, Mondal S, Pérez LG, Poland J, et al (2019) Integrating genomic-enabled prediction and high-throughput phenotyping in breeding for climate-resilient bread wheat. Theoret Appl Genetics 132(1):177–194
Article CAS Google Scholar
Jarquin D, Howard R, Xavier A, Das Choudhury S (2018) Increasing predictive ability by modeling interactions between environments, genotype and canopy coverage image data for soybeans. Agronomy 8(4):51
Article Google Scholar
Amani I, Fischer R, Reynolds M (1996) Canopy temperature depression association with yield of irrigated spring wheat cultivars in a hot climate. J Agronomy Crop Sci 176(2):119–129
Article Google Scholar
Aguate FM, Trachsel S, Pérez LG, Burgueño J, Crossa J, Balzarini M, et al (2017) Use of hyperspectral image data outperforms vegetation indices in prediction of maize yield. Crop Sci 57(5):2517–2524
Article CAS Google Scholar
Montesinos-López OA, Montesinos-López A, Crossa J, los Campos G, Alvarado G, Suchismita M, et al (2017) Predicting grain yield using canopy hyperspectral reflectance in wheat breeding data. Plant Methods 13(1):4
Google Scholar
Roorkiwal M, Jarquin D, Singh MK, Gaur PM, Bharadwaj C, Rathore A, et al (2018) Genomic-enabled prediction models using multi-environment trials to estimate the effect of genotype × environment interaction on prediction accuracy in chickpea. Sci Rep 8(1):11701
Article PubMed PubMed Central CAS Google Scholar
Sukumaran S, Crossa J, Jarquín D, Reynolds M (2017) Pedigree-based prediction models with genotype × environment interaction in multienvironment trials of CIMMYT wheat. Crop Sci 57(4):1865–1880
Article Google Scholar
Jarquín D, Lemes da Silva C, Gaynor RC, Poland J, Fritz A, Howard R, et al (2017) Increasing genomic-enabled prediction accuracy by modeling genotype × environment interactions in Kansas Wheat. Plant Genome 10(2):plantgenome2016–12
Google Scholar
Montesinos-López A, Montesinos-López OA, Cuevas J, Mata-López WA, Burgueño J, Mondal S, et al (2017) Genomic Bayesian functional regression models with interactions for predicting wheat grain yield using hyper-spectral image data. Plant Methods 13(1):62
Article PubMed PubMed Central CAS Google Scholar
Davis C (1962) The norm of the Schur product operation. Numer Math 4(1):343–344
Article CAS Google Scholar
Montesinos-López A, Montesinos-López OA, Campos G, Crossa J, Burgueño J, Luna-Vazquez FJ (2018) Bayesian functional regression as an alternative statistical analysis of high-throughput phenotyping data of modern agriculture. Plant Methods 14(1):46
Article PubMed PubMed Central Google Scholar
Shavrukov Y, Kurishbayev A, Jatayev S, Shvidchenko V, Zotova L, Koekemoer F, et al (2017) Early flowering as a drought escape mechanism in plants: How can it aid wheat production? Front Plant Sci 8:1950
Article PubMed PubMed Central Google Scholar
Zhang J, Yang C, Song H, Hoffmann WC, Zhang D, Zhang G (2016) Evaluation of an airborne remote sensing platform consisting of two consumer-grade cameras for crop identification. Remote Sens 8(3):257
Article Google Scholar
Berger B, Parent B, Tester M (2010) High-throughput shoot imaging to study drought responses. J Exper Botany 61(13):3519–3528
Article CAS Google Scholar
Campbell MT, Knecht AC, Berger B, Brien CJ, Wang D, Walia H (2015) Integrating image-based phenomics and association analysis to dissect the genetic architecture of temporal salinity responses in rice. Plant Physiol 168(4):1476–1489.
Article CAS PubMed PubMed Central Google Scholar
Apiolaza LA, Gilmour AR, Garrick DJ (2000) Variance modelling of longitudinal height data from a Pinus radiata progeny test. Can J For Res 30(4):645–654
Article Google Scholar
Apiolaza LA, Garrick DJ (2001) Analysis of longitudinal data from progeny tests: some multivariate approaches. Forest Sci 47(2):129–140
Google Scholar
de Souza Marçal T, Salvador FV, da Silva AC, Machado JC, Carneiro PCS, et al (2018) Genetic insights into elephantgrass persistence for bioenergy purpose. PloS One 13(9):e0203818
Article CAS Google Scholar
Wu WR, Li WM, Tang DZ, Lu HR, Worland A (1999) Time-related mapping of quantitative trait loci underlying tiller number in rice. Genetics 151(1):297–303
Article CAS PubMed PubMed Central Google Scholar
Yan J, Zhu J, He C, Benmoussa M, Wu P (1998) Molecular dissection of developmental behavior of plant height in rice (Oryza sativa L.). Genetics 150(3):1257–1265
Article CAS PubMed PubMed Central Google Scholar
Würschum T, Liu W, Busemeyer L, Tucker MR, Reif JC, Weissmann EA, et al (2014) Mapping dynamic QTL for plant height in triticale. BMC Genetics 15(1):59
Article PubMed PubMed Central Google Scholar
Moore CR, Johnson LS, Kwak IY, Livny M, Broman KW, Spalding EP (2013) High-throughput computer vision introduces the time axis to a quantitative trait map of a plant growth response. Genetics 195:p. genetics–113
Google Scholar
Paine CT, Marthews TR, Vogt DR, Purves D, Rees M, Hector A, et al (2012) How to fit nonlinear plant growth models and calculate growth rates: an update for ecologists. Methods Ecol Evol 3(2):245–256
Article Google Scholar
Ma CX, Casella G, Wu R (2002) Functional mapping of quantitative trait loci underlying the character process: a theoretical framework. Genetics 161(4):1751–1762
Article PubMed PubMed Central Google Scholar
Wu R, Ma CX, Zhao W, Casella G (2003) Functional mapping for quantitative trait loci governing growth rates: a parametric model. Physiol Genomics 14(3):241–249
Article CAS PubMed Google Scholar
Wu R, Lin M (2006) Functional mapping – how to map and study the genetic architecture of dynamic complex traits. Nat Rev Genetics 7(3):229
Article CAS PubMed Google Scholar
Das K, Li J, Wang Z, Tong C, Fu G, Li Y, et al (2011) A dynamic model for genome-wide association studies. Human Genetics 129(6):629–639
Article PubMed PubMed Central Google Scholar
Cui Y, Zhu J, Wu R (2006) Functional mapping for genetic control of programmed cell death. Physiol Genomics 25(3):458–469
Article CAS PubMed Google Scholar
He Q, Berg A, Li Y, Vallejos CE, Wu R (2010) Mapping genes for plant structure, development and evolution: functional mapping meets ontology. Trends Genetics 26(1):39–46
Article CAS Google Scholar
Bac-Molenaar JA, Vreugdenhil D, Granier C, Keurentjes JJ (2015) Genome-wide association mapping of growth dynamics detects time-specific and general quantitative trait loci. J Exper Botany 66(18):5567–5580
Article CAS Google Scholar
Campbell MT, Du Q, Liu K, Brien CJ, Berger B, Zhang C, et al (2017) A comprehensive image-based phenomic analysis reveals the complex genetic architecture of shoot growth dynamics in rice (Oryza sativa). Plant Genome 10(2):plantgenome2016–07
Google Scholar
Schaeffer L (1994) Random regressions in animal models for test-day production in dairy cattle. In: World Congress of Genetics Applied Livestock Production, vol 18. pp 443–446
Google Scholar
Schnyder U, Hofer A, Labroue F, Künzi N (2001) Genetic parameters of a random regression model for daily feed intake of performance tested French Landrace and Large White growing pigs. Genet Sel Evol 33(6):635
Article CAS PubMed PubMed Central Google Scholar
Luo P, Yang R, Yang N (2007) Estimation of genetic parameters for cumulative egg numbers in a broiler dam line by using a random regression model. Poultry Sci 86(1):30
Article CAS Google Scholar
Baldi F, Albuquerque L, Alencar M (2010) Random regression models on Legendre polynomials to estimate genetic parameters for weights from birth to adult age in Canchim cattle. J Animal Breeding Genetics 127(4):289–299
Article CAS Google Scholar
Howard JT, Jiao S, Tiezzi F, Huang Y, Gray KA, Maltecca C (2015) Genome-wide association study on legendre random regression coefficients for the growth and feed intake trajectory on Duroc Boars. BMC Genetics 16(1):59
Article PubMed PubMed Central Google Scholar
Kirkpatrick M, Lofsvold D, Bulmer M (1990) Analysis of the inheritance, selection and evolution of growth trajectories. Genetics 24(4):979–993
Article Google Scholar
Schaeffer L (2004) Application of random regression models in animal breeding. Livest Prod Sci 86(1–3):35–45
Article Google Scholar
Meyer K, Hill WG (1997) Estimation of genetic and phenotypic covariance functions for longitudinal or ‘repeated’ records by restricted maximum likelihood. Livest Prod Sci 47(3):185–200
Article Google Scholar
Pool M, Meuwissen T (2000) Reduction of the number of parameters needed for a polynomial random regression test day model. Livest Prod Sci 64(2–3):133–145
Article Google Scholar
Campbell M, Walia H, Morota G (2018) Utilizing random regression models for genomic prediction of a longitudinal trait derived from high-throughput phenotyping. Plant Direct 2(9):e00080
Article PubMed PubMed Central Google Scholar
Baba T, Momen M, Campbell MT, Walia H, Morota G (2020) Multi-trait random regression models increase genomic prediction accuracy for a temporal physiological trait derived from high-throughput phenotyping. PloS One 15(2):e0228118
Article CAS PubMed PubMed Central Google Scholar
Habier D, Fernando R, Dekkers JC (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177(4):2389–2397
Article CAS PubMed PubMed Central Google Scholar
Jannink JL (2010) Dynamics of long-term genomic selection. Genet Sel Evol 42(1):35
Article PubMed PubMed Central CAS Google Scholar
Iwata H, Hayashi T, Tsumura Y (2011) Prospects for genomic selection in conifer breeding: a simulation study of Cryptomeria japonica. Tree Genet Genomes 7(4):747–758
Article Google Scholar
Yabe S, Ohsawa R, Iwata H (2013) Potential of genomic selection for mass selection breeding in annual allogamous crops. Crop Sci 53(1):95–105
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Gota Morota & Malachy T. Campbell
Agronomy Department, University of Florida, Gainesville, FL, USA
Diego Jarquin
Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
Hiroyoshi Iwata

Authors

Gota Morota
View author publications
You can also search for this author in PubMed Google Scholar
Diego Jarquin
View author publications
You can also search for this author in PubMed Google Scholar
Malachy T. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyoshi Iwata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gota Morota .

Editor information

Editors and Affiliations

Arkansas Biosciences Institute, Arkansas State University, Jonesboro, AR, USA
Argelia Lorence
Arkansas Biosciences Institute, Arkansas State University, Jonesboro, AR, USA
Karina Medina Jimenez

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Morota, G., Jarquin, D., Campbell, M.T., Iwata, H. (2022). Statistical Methods for the Quantitative Genetic Analysis of High-Throughput Phenotyping Data. In: Lorence, A., Medina Jimenez, K. (eds) High-Throughput Plant Phenotyping. Methods in Molecular Biology, vol 2539. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2537-8_21

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2537-8_21
Published: 05 April 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2536-1
Online ISBN: 978-1-0716-2537-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Statistical Methods for the Quantitative Genetic Analysis of High-Throughput Phenotyping Data

Abstract

Similar content being viewed by others

A two-stage approach for the spatio-temporal analysis of high-throughput phenotyping data

A One-Stage Approach for the Spatio-temporal Analysis of High-Throughput Phenotyping Data

Modelling of Genotype by Environment Interaction and Prediction of Complex Traits across Multiple Environments as a Synthesis of Crop Growth Modelling, Genetics and Statistics

Key words

1 Introduction

2 Field-Based High-Throughput Phenotyping Using UAV

2.1 Application of HTP in Breeding Populations

2.2 Genetic Gain in HTP-Based Selection

2.3 Use of HTP for GWAS and GS

3 Integration of HTP Data into GS

3.1 Single-Trait Analysis

3.2 Multi-Trait Analysis

3.3 Genotype by Environment Interaction

4 Utilizing Image-Derived Longitudinal Traits for Genetic Studies in Plants

4.1 Single Time Point Genetic Inference

4.2 Functional Mapping

4.2.1 Single-Step Functional Mapping

4.2.2 Two-Step Functional Mapping

4.3 Insights from Animal Breeding for Genomic Prediction Using Longitudinal Traits

4.4 Multivariate Approaches for Longitudinal Genomic Prediction

4.5 Covariance Functions and Random Regression Models for Longitudinal Genetic Prediction

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Navigation

Statistical Methods for the Quantitative Genetic Analysis of High-Throughput Phenotyping Data

Abstract

Similar content being viewed by others

A two-stage approach for the spatio-temporal analysis of high-throughput phenotyping data

A One-Stage Approach for the Spatio-temporal Analysis of High-Throughput Phenotyping Data

Modelling of Genotype by Environment Interaction and Prediction of Complex Traits across Multiple Environments as a Synthesis of Crop Growth Modelling, Genetics and Statistics

Key words

1 Introduction

2 Field-Based High-Throughput Phenotyping Using UAV

2.1 Application of HTP in Breeding Populations

2.2 Genetic Gain in HTP-Based Selection

2.3 Use of HTP for GWAS and GS

3 Integration of HTP Data into GS

3.1 Single-Trait Analysis

3.2 Multi-Trait Analysis

3.3 Genotype by Environment Interaction

4 Utilizing Image-Derived Longitudinal Traits for Genetic Studies in Plants

4.1 Single Time Point Genetic Inference

4.2 Functional Mapping

4.2.1 Single-Step Functional Mapping

4.2.2 Two-Step Functional Mapping

4.3 Insights from Animal Breeding for Genomic Prediction Using Longitudinal Traits

4.4 Multivariate Approaches for Longitudinal Genomic Prediction

4.5 Covariance Functions and Random Regression Models for Longitudinal Genetic Prediction

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation