Following previous definitions, we have selected a number of papers illustrating each of the four categories of use of NIRS
in the context of breeding. The main features of these papers are summarized in Table 1, and further presented and detailed hereafter, with respect to the type of technology employed to obtain spectra, the statistical pretreatments of the spectra, and the statistical model applied for phenotype prediction. We also provide in Table 2 a comparison of the relative performances of phenomic and genomic predictions for the very few papers which enable such a comparison. Finally, we discuss the factors that affect the predictive ability of PS
.
Table 2 Comparative predictive ability reported for prediction based on G and H matrices 2.1 Types of Technology
Traditionally, NIRS
measurements are conducted in laboratories under controlled conditions for either dried vegetative tissue (e.g., forages) or dried reproductive tissue (e.g., grain). This kind of data displays many advantages: measurements are robust, low cost and routinely applied by breeders to predict quality traits. There are also a number of disadvantages, there is substantial extra effort needed to bring these materials from the field to the lab, and to dry them so that water absorbance (which overlaps other chemical bond absorbance) is minimized. In these laboratory conditions, spectra are constituted of many wavelengths possibly from the visible and near infrared (400 to 2500 nm approximately), constituting a dataset of hundreds of variables [9].
With the rise of high-throughput phenotyping, spectrum measurements have benefited from technological developments which enables the direct collection of spectra in the field possibly at several time points, like hyperspectral imaging from Unoccupied Aerial Vehicle (UAV) or direct measurements of fresh material with portable (micro-)spectrometers. Hyperspectral imaging takes images with several wavelengths for each pixel, possibly at multiple time points in the visible and in a small portion of the NIR spectrum. A reflectance measure is attributed to some groups of wavelengths (bands) or to individual wavelengths directly. The measurements at the pixel level can be integrated at the microplot level to characterize a unique variety [17, 22, 24, 25]. Portable (micro-)spectrometers have also been developed to measure the reflectance directly in fields on undestroyed fresh material covering the visible and NIR spectrum [18]. Wavelengths can be used directly as variables in predictive models or they can be derived in several indexes, like VIs [17, 22]. VIs describe vegetation properties by summarizing the information of large amounts of data to facilitate processing of camera and satellite images. However, in Aguate et al. [17] the use of all the hyperspectral bands achieved better prediction than using VI individually.
Technologies used to collect NIR spectrum are numerous, each with advantages and disadvantages. On the one hand, the use of NIR spectrometers in laboratory conditions is a robust method but can be time consuming due to collection and possibly preparation of samples. On the other hand, UAV and portable (micro-)spectrometers are quick techniques to collect NIRS
but the number of wavelengths available is usually reduced and measurements can be affected by environmental noise which is harder to control in the field than in the laboratory. Depending on the application, trade-offs must be found between labor intensity, costs, and spectrum quality.
To date, very few studies have tried to compare the predictive ability of different spectrum measurements, especially in the context of plant breeding. Recently, Zgouz et al. [27] have reported a dataset of spectra collected on 60 sugarcane samples with 8 visible/NIR spectrometers including handheld micro-spectrometers. Such a dataset is very useful to compare different tools, although results might be context dependent, i.e., the most accurate model for different traits and species might be obtained with different spectrometers. Still, quantifying the gain or the loss of predictive ability for each technique will be helpful to guide in using one technique rather than another for a specific objective. Other techniques could be used to facilitate measurement, for example to combine hyperspectral images and laboratory spectra. Instead of using a spectrometer to measure samples one by one, hyperspectral images can measure several samples at the same time. This would enhance robustness of spectra collection and reduce time of measurements. Beyond technical issues, it is also important to consider practical organizational questions, such as the period at which spectra are measured, to make sure that the predictions are available before the sowing of the next season.
2.2 Preprocessing NIR Spectra
In ideal conditions, NIR spectra are based on the Beer–Lambert law and the sample absorbance is directly linked to the concentration of chemical compounds of the sample. However, in practice, many factors (independent from the sample composition) will influence the measured absorbance. This is the case for instance of temperature or granulometry, which will deform the final spectrum, biasing spectra comparison. To deal with external effects, a mathematical correction or preprocessing can be applied as illustrated in Fig. 2 for spectra collected on bread wheat grains. Mainly two external effects usually need to be corrected: additive and multiple effects. In additive effects, noise affects spectra irrespectively of the wavelength and usually yields a baseline shift which can be corrected with a detrend [28] or a derivation (Fig. 2c) typically carried out through a Savitzky–Golay filter which consists in a polynomial smoothing [29]. The baseline shift appears when the absorbance increases with the wavelength due to the increased light intensity. Multiplicative effects typically affect spectra differently depending on the wavelength and are usually linked to an increase of the distance crossed by the photons (due to different granulometry for example). They can be corrected by a normalization (Fig. 2b). This effect is present when for low absorbances at a wavelength, the variability is also low and for high absorbances at a wavelength, the variability is high. Other preprocessing techniques have also been proposed to specifically deal with an external parameter known to bias spectra such as temperature or hygrometry. This is the case for instance of the method called External Parameter Orthogonalization (EPO, [30]).
The preprocessing methods briefly introduced in the previous paragraph have been previously developed in the chemometrics literature. This preprocessing is routinely and widely used when applying NIRS
in the classical way, i.e., to predict the composition of end-products. In the context of breeding and PS
, further preprocessing taken from the breeding literature can be carried out to improve the ability of the spectra to predict genetic values. Such preprocessing includes building a model on the absorbance or reflectance at each wavelength taking into account the effects of the experimental design (e.g., blocks or spatial effects) together with genetic effects to further extract genotypic values [25, 26]. Genotypic values may be BLUEs or BLUPs depending on whether the genotype effect is considered as fixed or random in the model. This preprocessing typically comes from the fact that PS
is carried out at the genotype level rather than at the individual or plot level, and consequently one needs to obtain a unique NIRS
matrix at the genotype level for model training and prediction. It is interesting to note that if the entire spectrum is considered rather than absorbance or reflectance at given wavelengths, such corrections are related to the orthogonalization approaches from the chemometrics literature. Indeed, recently Ryckeweart et al. [31] proposed to make use of spectra replicates, typically obtained when characterizing plants under genetic trials, to reduce the repeatability error. They developed a new preprocessing technique based on orthogonalization after an ANOVA–simultaneous component analysis (REP-ASCA).
The filters mentioned previously are not an exhaustive list but have been the most commonly used in NIRS
chemometric prediction. Preprocessing can be done in numerous ways, as shown across different studies, suggesting that no one standard preprocessing approach exists. We have noticed that PS
predictions were influenced by the preprocessing applied on spectra, consequently we recommend testing different filters on a subset of data to cross-validate filters efficiency, before carrying out deeper analysis.
2.3 Statistical Models for Phenotype Prediction
NIRS
reflectances or absorbances are quantitative variables, like bi-allelic markers usually coded numerically with allelic dosages, basically all models developed or used in the frame of genomic selection can also be used for PS
, from the “infinitesimal” model to Bayesian models with various prior distributions or machine learning methods.
One such reference model for PS
is the H-BLUP, similar to G-BLUP but with a similarity matrix (H) estimated with NIRS
[1, 23, 25]. Different kernels can be used within such a framework, including Gaussian kernel or arc-cosine kernel [24]. As with molecular markers, this model can be equivalent to a ridge regression on the wavelengths, provided the H matrix is computed accordingly, as demonstrated hereafter. The predictive ability of the H-BLUP model can be measured with cross-validation, as with G-BLUP or other GS
models.
The H-BLUP model is defined as: y = μ + u + e, with\( \mathit{\operatorname{var}}(y)=H{\sigma}_u^2+I{\sigma}_e^2 \), and where y is a vector of phenotypes, H is the NIR spectra-based similarity matrix as defined above, μ is the intercept, u and e are random genetic and residual effects, respectively. The RRN-BLUP model (Ridge Regression NIRS
BLUP) is defined as: y = μ + Sv + e, with \( \mathit{\operatorname{var}}(y)={SS}^{\prime }{\sigma}_S^2+I{\sigma}_e^2 \), and where S is the matrix of preprocessed, centered, and scaled NIRS
as defined above. The mean of y is equal to μ in both models, thus H-BLUP and RRN-BLUP are equivalent if \( H{\sigma}_u^2=S{S}^{\prime {\sigma}_S^2} \), which is for instance the case when \( H=\frac{SS^{\prime }}{n_w} \) and \( {\sigma}_S^2=\frac{\sigma_u^2}{n_w} \).
Functional regression models seem particularly interesting for PS
, as they model the linear trend of the spectra [22]. Different kinds of functional regressions were proposed such as functional B-Spline, functional Fourier [22], and Bayesian functional [32]. H-BLUP and functional regression models have proven to yield accurate predictions while reducing computational time by diminishing the number of parameters to estimate. This could be important if several spectra from different environments are available, resulting in a high number of predictors.
Partial Least Squares (PLS) regression, classically used in chemometrics, or variable selection approaches (such as LASSO or BayesB) can also be used to tackle multicollinearity and high dimensionality. PLS regression consists of condensing the information contained across all wavelengths into a few orthogonal variables that maximize the covariation between the predictor matrix and the response variable. In LASSO and BayesB, it is assumed that only a portion of the variables has an effect on the trait. Variable selection seems promising for PS
, because the spectrum could be restricted to its most heritable parts [25, 33]. However, it should be noted that the preselection of wavelengths using vegetation indices or with knowledge on the genomic heritability
of the wavelengths generally result in lower prediction accuracies than when using the full spectrum [17, 25].
In GS
, the choice of the prediction model can be guided by the expected genetic architecture of the predicted trait. The choice of a PS
model adapted to a given trait cannot yet rely on such assumptions, and it is not clear how the optimal prediction model can be related to the trait characteristics. The various models tested in the literature sometimes resulted in contrasted prediction accuracies, but in general, sophisticated models were not better than a simple H-BLUP. Models relying on a mixture of distributions such as BayesR [34] are accurate for contrasted genetic architecture in GS
; it would be interesting to test them in PS
. In any case, alternative prediction models should be compared using cross-validations within the calibration set.
Contrary to molecular markers in GS
, in PS
several spectra corresponding to different replicates of genotypes possibly across different environments can be available to build predictive models. In this case, one possibility for calibration is to test each spectrum in order to determine the one which yields the most accurate predictions. Another possibility is to make use of all the available spectra. Lane et al. [26] proposed in the frame of the H-BLUP model to compute the mean of the relationship matrices calculated from each spectrum individually. It is noteworthy that this proposition is equivalent to computing the relationship matrix from a large combined spectra matrix, providing that the individual spectra matrices have the same number of wavelengths, as shown hereafter.
The similarity matrix HT(i, j) computed with the combined spectra matrix (in which all spectra matrices are included one next to the other) is given by:
$$ {H}_T\left(i,j\right)=\frac{\sum_{k=1}^{n_t}\left[{S}_T\left(i,k\right)\times {S}_T\Big(j,k\Big)\right]}{n_t} $$
$$ {H}_T\left(i,j\right)=\frac{\sum_{p=1}^{n_w}\left[{S}_1\left(i,p\right)\times {S}_1\Big(j,p\Big)\right]+\dots +{\sum}_{p=1}^{n_w}\left[{S}_{n_l}\left(i,p\right)\times {S}_{n_l}\Big(j,p\Big)\right]}{n_l\times {n}_w} $$
$$ {H}_T\left(i,j\right)=\frac{1}{n_l}\times \left[\frac{\sum_{p=1}^{n_w}\left[{S}_1\left(i,p\right)\times {S}_1\Big(j,p\Big)\right]}{n_w}+\dots +\frac{\sum_{p=1}^{n_w}\left[{S}_{n_l}\left(i,p\right)\times {S}_{n_l}\Big(j,p\Big)\right]}{n_w}\right] $$
$$ {}_T\left(i,j\right)=\frac{1}{n_l}\times \left[{\sum}_{u=1}^T{H}_u\left(i,j\right)\right] $$
ST has dimension n (number of individuals) times nt = nl × nw, with nl the number of spectra (e.g., number of environments in which NIRS
was acquired) and nw the number of wavelengths of each spectrum (we consider that all spectra have the same wavelengths). Sud(i, k) is the absorbance or reflectance measured on the ith individual for the kth wavelength in the uth NIR preprocessed spectrum. Sud has the dimension n (number of individuals) times nw (number of wavelengths). Hu(i, j) is the similarity between individuals i and j estimated with one given u spectrum.
2.4 Relative Performance of PS Versus GS
There are very few studies that compare PS
(and in particular GLOB selection) with GS
(Table 2). Although Lane et al. [26] was one of the two studies that implemented GLOB prediction with spectra following our definition, it could not be included in this comparison because they did not apply GS
. Table 2 illustrates that PS
and GLOB selection have been mainly implemented on cereal species, probably because of the widespread and routine use of NIR measurements on grains to predict protein content. Krause et al. [23] and Galán et al. [25] reported similar accuracies for GS
and PS
, while Cuevas et al. [24] showed lower accuracies for PS
compared to GS
(0.37 and 0.46, respectively). The highest PS
accuracy compared with GS
was observed in Rincent et al. [1] in wheat. The lowest PS
accuracy compared to GS
was observed in Rincent et al. [1] in poplar for which NIR spectra were collected on wood for a reduced range of wavelengths. From these data, it is apparent that PS
had comparable or higher accuracies than GS
in most cases. Even in cases where PS
is less accurate than GS
, as NIR measurements are high throughput and low cost compared to genotyping, PS
could still provide higher genetic gains than GS
, as demonstrated in Rincent et al. [1]. In our ongoing research we compared GS
and PS
at different generations of elite bread wheat selection. We found that PS
could be as accurate as GS
and even better when applied to early generations. Further work on other species is clearly needed to deepen this comparison and provide valuable information on the factors and conditions (e.g., tissue, environment) that determine the predictive ability of NIRS
.
By considerably reducing the costs of implementation, PS
is a tool of choice to improve the balance between costs and benefits in comparison with GS
. PS
would be particularly valuable for orphan crops for which genotyping is expensive, or for major crops for which phenomic data are already routinely collected (e.g., maize and wheat). In the latter case, phenomic prediction already opens new possibilities in existing breeding programs without any additional cost, and with predictive abilities similar to those obtained with genomic prediction [1].
2.5 Factors Affecting PS Predictive Ability
In the past, several kinds of omics were used to make genomic-like predictions with promising results [8]. NIRS
captures an integrative signal, and is biologically more difficult to interpret than other omics, which describe each molecule individually (e.g., transcriptomics, proteomics, metabolomics). However, because prediction models do not necessarily need to be interpreted biologically, NIRS
can be used to make predictions using “black-box” models.
There are two factors that contribute to the predictive ability and consequently to the success of PS
: (1) the ability to capture target trait proxies and (2) the ability to infer genetic relatedness. The former depends on the physiological connectedness between the target trait and the composition and features of the tissue analyzed with NIRS
. This is for example the case when NIR spectra collected on wood powder is used to predict wood properties, or when NIR spectra collected on fruits is used to predict fruit composition. In these cases, PS
should be nearly equivalent or superior to the traditional way of using NIRS
(prediction of the tissue composition), the only difference being that when doing PS
we usually work at the genotype level because we aim at ranking and selecting the best genotypes while in the traditional use of NIRS
we make predictions at the plot or plant level [26]. However, we could think of more indirect relationships between the target trait and NIR spectra to explain its predictive ability, for instance in wheat the good predictive ability of PS
for yield could be due to the fact that NIRS
is a very good predictor of grain composition, that is often negatively correlated to yield. This could also be the case for maturity: the spectra are influenced by the maturity of the plants, and this maturity is sometimes correlated to yield [23]. However, it is important to stress that even in the absence of any direct relationship between the predicted trait and the tissue analyzed with NIRS
, PS
can still be accurate. It was for instance shown in Rincent et al. [1] that NIR spectra collected on leaves in one environment could be used to estimate a covariance matrix resulting in accurate prediction of yield in a completely independent environment. In this particular example, the correlation between yield in the environment in which NIR spectra were obtained, and yield in the predicted environment (by cross-validation within the predicted environment) was as low as 0.16, whereas PS
predictive ability was above 0.5. This means that NIRS-derived relationship matrices were able to capture genetic relatedness between lines valuable for predicting yield. This was further demonstrated by the fact that genomic heritability
was significant for many wavelengths. A further demonstration could be done with a simulation study, by estimating the predictive ability of PS
for traits simulated with genotype data. In this case, the predictive ability of PS
averaged over a large number of simulated traits would provide an evaluation of the ability of NIRS
to infer genetic relationships for predicting quantitative traits unrelated to the tissue composition.
PS
is a recent research topic, and further investigations are required to use it in an optimal way. One can expect that, as for genomic selection, prediction accuracy will be strongly dependent on the target trait and its heritability
, as well as the size and composition of the training set. In comparison to GS
, prediction accuracy obtained with PS
can also be affected by the origin of the spectrum (tissue, environments, kind of sensors). This is similar to the choice of a SNP array and marker filtering in GS
, but the effect of the origin of the spectrum appears to be more pronounced. First results suggest that NIR spectra collected under plant stress conditions are more efficient [1, 26], but other experiments are required before it can be understood if this is the rule or the exception. An interesting result is that, in practice, the combination of different NIR spectra (collected on different tissues or different environments) leads to predictive abilities at least as good as those obtained with the best NIR spectrum taken alone ( [1], unpublished results). This means that in some cases, it is not necessary to identify the best conditions to obtain NIR spectra, but simply to aggregate all the spectra collected (e.g., spectra obtained on the same genotypes at the different steps of the breeding program). As shown in the present review, aggregating NIRS
matrices prior to computing the H matrix is equivalent to averaging H matrices estimated with individual NIRS
matrices and is thus quite straightforward. However, in any case, the choice of the tissue, timing, and sensors could and should also be optimized. We can think that NIR spectra collected on homogeneous, representative samples (leaf powder, seed sample, or flour) are more useful than NIR spectra obtained on a tiny area of raw material. The way NIR spectra are collected should also be optimized in terms of practical feasibility. For instance in wheat, it would be much more feasible to measure NIR spectra during the growing season, than on grain after harvest, because the few weeks between harvest and sowing of the next generation is labor intensive and so NIR spectra acquisition would be difficult during this period.