Airborne hyperspectral and Sentinel imagery to quantify winter wheat traits through ensemble modeling approaches

Early prediction of crop production by remote sensing (RS) may help to plan the harvest and ensure food security. This study aims to improve the quantification of yield, grain protein concentration (GPC), and nitrogen (N) output in winter wheat with RS imagery. Ground-truth wheat traits were measured at flowering and harvest in a field experiment combining four N and two water levels in central Spain over 2 years. Hyperspectral and thermal airborne images coincident with Sentinel-1 and Sentinel-2 were acquired at flowering. A parametric linear model using all hyperspectral normalized difference spectral indices (NDSI) and two non-parametric models (artificial neural network and random forest) were used to assess their estimation ability combining NDSIs and other RS indicators. The feasibility of using freely available multispectral satellite was tested by applying the same methodology but using Sentinel-1 and Sentinel-2 bands. Yield estimation obtained the highest R2 value, showing that the visible and short-wave infrared region (VSWIR) had similar accuracy to the hyperspectral and Sentinel-2 imagery (R2 ≈ 0.84). The SWIR bands were important in the GPC estimation with both sensors, whereas N output was better estimated using red-edge-based NDSIs, obtaining satisfactory results with the hyperspectral sensor (R2 = 0.74) and with the Sentinel-2 (R2 = 0.62). When including the Sentinel-2 SWIR index, the NDSI (B11, B3) improved the estimation of N output (R2 = 0.71). Ensemble models based on Sentinel were found to be as reliable as those based on hyperspectral imagery, and including SWIR information improved the quantification of N-related traits.


Introduction
Remote sensing (RS) techniques are a suitable strategy for predicting crop productivity from in-season crop status and for adjusting nitrogen (N) and water requirements, which are essential elements for plant health that determine crop productivity (Gonzalez-Dugo et al., 2009). An optimal N fertilizer rate at the correct time is crucial for reducing groundwater contamination due to NO 3 -N leaching and for avoiding economic loss for farmers (Ottman et al., 2000). Adjusting water delivery to crop demand is important for optimizing water use in a scenario of climate change and for avoiding decreased plant physiological functions (Hoogmoed & Sadras, 2018). Because the final harvest depends on the crop physiological status at early stages, RS estimation of crop status can be used to predict final crop traits in advance (Quemada et al., 2014). Approaches based on RS offer extensive information on the in-season crop status that affects crop productivity, but it is necessary to identify the most suitable and affordable sensor as well as the most sensitive spectral region in order to optimize crop trait prediction (Prey & Schmidhalter, 2019a).
Spectral vegetation indices (VIs) based on canopy reflectance are commonly used to estimate different crop traits depending on the region of the spectrum used (Gabriel et al., 2017). Photosynthetic pigments such as chlorophyll a and b, or carotenoids, absorb light in the visible region of the spectrum (450-680 nm) (Sims & Gamon, 2002) and their content depends on crop N status, among other factors (Heitholt et al., 1991). The region lying between the visible and near-infrared (NIR) regions is called the "red-edge" (680-780 nm), and the shape and position change according to the chlorophyll content levels, which are used as a proxy for crop N status (Cho & Skidmore, 2006); thus, different studies have demonstrated the ability of this region to estimate N content (Inoue et al., 2016). Reflectance at longer wavelengths, such as the NIR (780-1100 nm) and short-wave infrared (SWIR) (1100-3000 nm) regions that penetrate deeper in leaves, is influenced by internal leaf structure and composition, such as plant biochemical content (Serrano et al., 2002). Well-watered plants expand intercellular air space in the spongy mesophyll, increasing the canopy density and therefore the reflectance in the NIR region (Zhao & Nakano, 2018). Water absorption wavelengths located in the SWIR region are commonly used for estimating water (Quemada & Daughtry, 2016;Quemada et al., 2018;Sims & Gamon, 2003). Plant protein content can be assessed with the SWIR region using the absorption feature of N-H bonds (Curran, 1989). Microwave-based indices, such those obtained with the Sentinel-1 synthetic aperture radar (SAR), are useful for vegetation monitoring due to their sensitivity to the dielectric and geometrical properties of the canopy, and they were successfully applied for the estimation of biomass (Sinha et al., 2015), plant growth dynamics, or soil and vegetation water content (Mandal et al., 2020). Solar-induced fluorescence (SIF) emission, which is a proxy of photosynthetic capacity, has been widely used to detect plant stress during the past few decades (Mohammed et al., 2019). The proportion of the emitted light varies with the plant status; therefore, chlorophyll fluorescence can be used for the diagnosis of crop N (Camino et al., 2018) or water status (Zarco-Tejada et al., 2012) due to the reduced photosynthesis reported under stress conditions. Finally, plant temperature is an indicator of water status because plant transpiration decreases under water stress, leading to an increase in leaf and canopy temperature (Jackson et al., 1981). Because plant and soil temperature can be drastically different, the canopy temperature measured with remote sensors can be affected by the soil fractional cover (Shivers et al., 2019). For this reason, Moran et al. (1994) developed the water deficit index (WDI) to quantify plant water stress using land surface temperature and in-situ-collected climate data while accounting for the vegetation fractional cover using a spectral VI.
The quasi-continuous spectrum acquired by hyperspectral sensors is more sensitive than broader-band multispectral sensors and offers more information that can be useful for crop monitoring . However, hyperspectral sensors are less affordable and have spatial coverage limitations (Dian et al., 2021). For these reasons, freely accessible multispectral satellite images for crop monitoring (Zhao et al., 2019) or yield forecasting 1 3 (Gómez et al., 2019) are receiving increasing attention. Moreover, convolution of Sentinel-2 bands from ground-truth hyperspectral information confirmed the high accuracy of reflectance acquired by satellite imagery and, therefore, the potential for crop trait estimation (Pancorbo et al., 2021a;Prey & Schmidhalter, 2019a).
Using parametric statistical models that combine several RS indicators to estimate crop traits is a common practice and has provided reliable results for many crops, including wheat (Prey & Schmidhalter, 2019a;Raya-Sereno et al., 2021a;Zhao et al., 2019), maize (Leroux et al., 2019;Quemada et al., 2014), or rice (Liu & Sun, 2016). However, the error committed in the prediction is still high for many crop traits (Colaço & Bramley, 2018;Quemada et al., 2014;Raun et al., 2005), and particularly for those related to crop N status and grain quality Rodrigues et al., 2018). Non-parametric models, such as random forest (RF) (Breiman, 2001) or artificial neural network (ANN) (Rumelhart et al., 1986), are expected to improve crop trait prediction thanks to their ability to find patterns, extract information, and build high-performance predictive models from big datasets (Van Klompenburg et al., 2020). Because of this, the number of studies that combine RS data with machine learning (ML) techniques to estimate yield is increasing each year (Ma et al., 2019;Van Klompenburg et al., 2020). However, more studies using ML to estimate N-related traits are needed, particularly grain quality (Aranguren et al., 2020;Prey & Schmidhalter, 2019a;Raya-Sereno et al., 2021b).
This study aimed to improve the estimation of winter wheat traits (yield, grain protein concentration (GPC), and N output) with remote sensing imagery by evaluating airborne hyperspectral and satellite multispectral sensors. The specific objectives were (i) to determine the best spectral regions for assessing each winter wheat trait, (ii) to quantify the improvement in the estimation when combining indices related to different crop biophysical parameters, and (iii) to analyze the feasibility of using indices derived from the freely available Sentinel-1 and -2 to estimate winter wheat traits.

Field experiment
A field experiment with winter wheat (Triticum aestivum L.) was carried out at La Chimenea research station (40° 04′ N, 03° 32′ W, 550 m a.s.l.), central Spain, during the 2018 and 2019 growing seasons. The climate of the area is classified as cold semi-arid (Bsk) according to the Köppen classification. The mean annual temperature is 14.2 °C and the rainfall is 350 mm. Spring and summer are characterized by low precipitation. However, the 2018 experimental year was unusually wet (342 mm from 1 November 2017 to 20 July 2018).
Each year, the study site was a different quarter of a field irrigated by a circular pivot (220 m radius) that enables an adjustable and uniform water delivery. The winter wheat was homogeneously sown in November of the previous year at a seeding rate of 220 seeds ha −1 . Soil uniformity and low levels of soil N inorganic content was ensured by establishing a maize crop (Zea mays L.) before both wheat experiments that did not receive N fertilizer. In addition, no organic amendments were applied during the 4 years before the experiments. A factorial experiment was established in 32 plots (22 × 22 m in 2018 and 25 × 25 m in 2019) marked with real-time kinematic (RTK) Global Navigation Satellite System (GNSS) observations for both years. The plots were randomly assigned to four N levels and two water levels, with four replications (Fig. 1). The N levels were established by applying different amounts of N fertilization (calcium ammonium nitrate) at tillering (66%) and at stem elongation (33%). Before the first N application, soil samples were collected to 0.6-m depth at 0.2-m intervals to determine the soil inorganic N content (kg N ha −1 ) and to adjust the amount of N fertilization accordingly. The soil inorganic N content was 36 kg N ha −1 in 2018 and 57 kg N ha −1 in 2019. The N fertilizer rates applied to each N level were 0, 50, 100, and 150 kg N ha −1 for N0, N1, N2, and N3, respectively, in 2018 and 0, 42, 92, and 142 kg N ha −1 in 2019. Therefore, the four N levels corresponded to N0 (no N application), N1 (N stress), N2 (recommended N rate), and N3 (overfertilized).
The two water levels were established at the beginning of flowering (growth stage (GS) 63 for both years) by irrigating half of the plots. The plots receiving this irrigation are referred to in the text as "W2" and the others as "W1". The amount of water delivered in the W2 plots of the 2018 experimental year was 25 mm on May 8. An accumulated rainfall of 46 mm between May 24 and 29 replenished soil water storage, mitigating crop water stress. Two irrigation events were conducted at flowering in 2019 in the W2 plots: on May 7 (30 mm) and May 10 (9 mm). More information on the experiment can be found in Pancorbo et al. (2021b).

Crop data
The effect of N and water treatments on winter wheat was determined by analyzing two samples of aerial biomass per plot at flowering and analyzing the grain at harvest. The biomass samples (0.5 × 0.5 m) used in this study were collected 3 and 4 days after the last W2 irrigation event in both years: on 11 May 2018 and 13 May 2019 (GS65) ( Table 1). To measure yield, all plots were harvested on 20 July 2018 and 3 July 2019 with a 1.4-m-wide combiner. A 1-m buffer was left at both ends of the plots to avoid edges. Simultaneously, a grain sample of each plot was saved for analysis. The biomass and grain samples were dried at 65 °C for 48 h and weighed to determine the moisture content and the aboveground biomass (kg dry matter ha −1 ) and yield (kg grain ha −1 ). Subsamples of each biomass and grain samples were analyzed in the laboratory to determine the N concentration (%N) by using the Dumas combustion method (LECO FP-428 analyzer, St. Joseph, MI, USA). The grain protein concentration (GPC; %) of each plot was calculated from the grain N concentration, and the N output (kg N ha −1 ) was calculated by multiplying yield by grain N concentration.

Airborne campaigns
Two hyperspectral sensors covering the visible-NIR and a portion of the NIR-SWIR region, together with a thermal sensor, were installed in tandem on a Cessna aircraft that flew 300 m above the experiment site at 75 km h −1 ground speed, heading on the solar plane at 11 GMT in both years. The two flights were conducted at midday to minimize the effects produced by different sun illumination angles. Airborne campaigns were operated by the Laboratory for Research Methods in Quantitative Remote Sensing of the Consejo Superior de Investigaciones Científicas (QuantaLab, IAS-CSIC, Spain) as close as possible to biomass collection ensuring cloud-free sky conditions. The flights took place 4 and 6 days after the W2 irrigation event (15 May 2018, 16 May 2019; GS65 both years) ( Table 1).
The hyperspectral imager covering the VNIR region (Hyperspec VNIR model, Headwall Photonics, Fitchburg, MA, USA) captured the reflected light between 400 and 850 nm with a spectral resolution of 6.5 nm full-width at half maximum (FWHM) and 50° field of view (FOV) that yielded a spatial resolution of 0.2 m. Reflectance in the SWIR region was obtained with a hyperspectral sensor (NIR-100 model, Headwall Photonics, Fitchburg, MA, USA) from 950 to 1750 nm at 6.05 nm FWHM, with an FOV of 38.6° and 0.6 m spatial resolution. The radiometric calibration of the hyperspectral cameras was conducted with an integrating sphere (CSTM-USS-2000C LabSphere, North Sutton, NH, USA) using four levels of illumination and six integration times. The atmospheric correction was made by measuring incoming irradiance with a field spectrometer concurrently with the flights and also simulated by the SMARTS model using aerosol optical depth (AOD) and weather data (Gueymard, 2001).
Smoothing of the spectral data was performed following the Savitzky-Golay method with a filter length of 9 interpolated to 1 nm.
The surface temperature was recorded with a thermal sensor (SC655 model, FLIR Systems, Wilsonville, OR, USA) at a spatial resolution of 0.25 m, 16-bit radiometric resolution, focal length of 13.1 mm, and 45 × 33.7° FOV in each flight. Thermal imagery was calibrated using ground temperature data collected with a handheld infrared thermometer (LaserSight, Optris, Germany).
The data of each plot were extracted from the orthorectified images using the RTK GNSS coordinates and leaving a 2-m buffer in each site of the plots to ensure representativeness. Average spectra and temperature values were obtained from each plot.

Satellite datasets
The satellite imagery used was the multispectral sensor carried by the Sentinel-2 and the synthetic aperture radar (SAR) instrument onboard the Sentinel-1. The Sentinel imagery was downloaded from the European Space Agency (ESA) DataHUB server (ESA, 2022a). The Sentinel-2 payload is the multi-spectral instrument (MSI), which is a push-broom sensor with a swath width of 290 km. The MSI provides 13 spectral bands from the visible to SWIR regions at different spatial and spectral resolutions (Supplementary Material S. 1). The products selected were those with the overpass closest to the biomass collection campaigns, ensuring cloud-free sky conditions over the experimental site in both years ( Table 1). The products downloaded were the Sentinel-2 Level 2A, which indicates that the atmospheric corrections were automatically made by the payload data ground segment (PDGS) with the Sen2Cor procedure using atmospheric constituents derived from in-scene spectral bands. Geometric alignment of the imagery was conducted using the airborne images as reference.
Because no pure pixels of the Sentinel-2 20-m bands were available for all plots, the Sentinel-2 bands were convolved using the reflectance spectra collected with the aircraft. To validate the convolved bands, first, the 10-m Sentinel-2 bands were resampled to 20 m, which resulted in 59 pixels per year in the study site. For each Sentinel-2 pixel, the average spectrum of the airborne imagery was extracted and convolved using the spectral response function (ESA, 2022b). The coefficients of the regression line of the linear relationship between the Sentinel-2 and the convolved bands were applied to each convolved band to increase similarities between the real Sentinel-2 and the convolved bands. The resulting bands were used to calculate the Sentinel-2 spectral indices applied in this study. Finally, the validation process was performed by analyzing the linear relationship between the NDSIs calculated with Sentinel-2 and with the convolved bands (n = 118).
This study tested the performance of combining Sentinel-1 and Sentinel-2 information for winter wheat trait estimation. Sentinel-1 provides cross-orbit images of dual-polarized (VV-VH) backscatter in the C band (central frequency of 5.405 GHz) in the ascending and descending orbits. In this study, the Sentinel-1 ground range detected (GRD)-products were downloaded from ESA (2022a). Radiometric correction was applied to obtain the co-and cross-polarized backscatter, (σ VV 0 and σ VH 0 (db)) using the Sentinel Application Platform (SNAP) (Mandal et al., 2020). The modification of the quad-pol radar vegetation index (RVI) (Kim & Van Zyl, 2009) for dual-pol SAR data was calculated as 4σ 0 VH/ (σ 0 VV + σ 0 VH) (Trudel et al., 2012) and extracted the value for each plot.

Selection of indices as proxy of crop biophysical parameters
The winter wheat trait estimation capacity when using one spectral vegetation index was compared against combining different indices with ensemble models. For each trait, the indices included in the ensemble models were selected according to their link with the traits and with specific crop biophysical parameters, and they were grouped according to the sensor required to calculate it. Each group of indices was added one at a time to the ensemble models to quantify the potential improvement in the estimation (Fig. 2).
The mean airborne spectrum of each plot was used to construct a contour map for each winter wheat trait. The contour maps represent the R 2 value from the linear regression between each trait and each normalized difference spectral index [NDSI (λ 1 , λ 2 ) = (λ 1 − λ 2 )/(λ 1 + λ 2 )] calculated with a combination of all possible hyperspectral bands (λ) when λ 1 > λ 2 and λ ∈ [400, 1750 nm]. From each contour map, the NDSI with the highest R 2 value was selected and used as benchmark to test the potential improvement when combining several indices. The first sensor tested with the ensemble models was the hyperspectral VNIR. For this, a canopy structure-related and a chlorophyll-related NDSI were selected from this region. The structural NDSI was selected among those based on an NIR and a visible band (Rondeaux et al., 1996). The chlorophyll-related indices were selected among the NDSIs based on an NIR and a red-edge band, two bands in the red-edge, or two bands in the visible region (Prey & Schmidhalter, 2019b). The links between the selected NDSIs with the canopy structure, and chlorophyll content were verified by analyzing their linear relationship with aboveground biomass and plant %N, respectively. The second sensor included was a hyperspectral sensor that covers the SWIR region. To this end, an NDSI based on one or two SWIR bands was selected and included in the ensemble models. Collinearity between the selected NDSIs was avoided by ensuring a Pearson correlation coefficient of ≤ 0.75 when possible. The third analysis tested the performance when a highresolution VNIR hyperspectral imager and the radiance information are also available. For this purpose, the solar-induced chlorophyll fluorescence (SIF) emission was calculated by the Fraunhofer line-depth (FLD) principle using the solar irradiance and the radiance emitted by the canopy in the atmospheric O 2 absorption features at 760.5 nm (Meroni et al., 2010). The FLD method used the radiance L in (L 762 nm) and L out (L 750 nm) as well as the irradiance E in (E 762 nm) and E out (E 750 nm) from the irradiance spectra measured at the time of the flights. The last sensor included in the analysis was a thermal camera because of its capacity to determine the crop water status . The water deficit Fig. 2 Workflow followed in this study. VNIR refers to a normalized difference spectral index (NDSI) based on the 400-1000-nm region. VSWIR indicates an NDSI with at least one band in the 1000-1750-nm region. Chl and Stru indicate an NDSI related to chlorophyll content and canopy structure, respectively. SIF, WDI, and RVI indicate solar-induced fluorescence, water deficit index, and radar vegetation index, respectively. MLR, ANN, and RF refer to the ensemble models multiple linear regression, artificial neural network, and random forest. GPC indicates grain protein concentration (%) index (WDI) was calculated based on the vegetation index-temperature trapezoid (VIT) using the soil adjusted vegetation index (SAVI; Huete et al., 1988) as a proxy of ground cover (Moran et al., 1994).
The estimation capacity using satellite information was tested by applying the methodology described earlier but using Sentinel-1 and Sentinel-2 to calculate the indices (Fig. 2). The structural, chlorophyll, and SWIR indices used as input variables for estimating wheat traits with the airborne hyperspectral sensor were similarly calculated using the closest Sentinel-2 convolved bands. The B8A band was not used because its spectral region (855-875 nm) was in the gap between the regions covered by the VNIR (400-850 nm) and the SWIR (950-1750 nm) sensors installed on the aircraft. The B11 was used as the SWIR band in all SWIR-based indices because the B12 region was not covered by the aircraft spectral range (Supplementary Material S. 1). The RVI calculated with Sentinel-1 was included in the analysis to quantify the potential improvement when using the combination of Sentinel-1 and Sentinel-2 for winter wheat trait estimation. The SIF and the WDI cannot be calculated with the Sentinel dataset and, therefore, were not included in the Sentinel analysis.

Ensemble models for winter wheat trait estimation
The ensemble models used to quantify the potential improvement in the estimation when combining different sensors/indices were (i) multiple linear regression (MLR), (ii) artificial neural network (ANN), and (iii) random forest (RF). The tenfold cross-validation resampling technique was used with a random subset of 70% of the plots for training and the remaining 30% for testing. The training dataset was used for calibrating and optimizing the models. The testing dataset was used to evaluate the model transfer learning ability by calculating the coefficient of determination (R 2 ) and the root mean square error (RMSE) between the measured traits and the estimated values at each fold.
The performance of the MLR model was evaluated by, firstly, fitting the training dataset to the crop trait analyzed. Secondly, the equation of the linear regression was used with the testing dataset. Finally, the linear relationship between the predicted and the observed crop trait was analyzed.
The ANN model was built using the back-propagation algorithm. This model consists of a network composed of one input layer, one or more hidden layers, and one output layer connected by neuron-like units. Each connection has a weighting factor. During the training process, the back-propagation algorithm repeatedly adjusts the weighting factors to minimize the mean square error (MSE) between the output and the estimated parameter (Rumelhart et al., 1986). The ANN model was executed using the "neuralnet" package implemented in the R software (version 4.0.5; R Core Team, 2021). In this study, the ANN was run by setting the number of hidden layers as 2 and the number of neurons in the hidden layer equal to the number of input variables.
The RF is an ML model based on multiple decision trees (Breiman, 2001). The RF regression was conducted using the "randomForest" R package (Liaw & Wiener, 2002). The training dataset was used to optimize the model by selecting the optimal number of regression trees (ntree) and the number of variables included at each node (mtry). The most suitable value of ntree was calculated by varying it from 50 to 1000 with 50 intervals while fixing mtry as default (500). The mtry value selected was optimized by varying mtry from 1 to the number of input variables minus 1 with a single interval, while setting ntree as the optimized value. For the optimization process, the ntree and mtry variables selected 1 3 were those that obtained the minimum MSE using the training dataset. The RF model also quantified the importance of each input variable in the estimation as the increase in node purity (IncNodePurity; Dube et al., 2019). This index measures the increase in MSE when permuting the out-of-bag (OOB) portion of the data (Liaw & Wiener, 2002). The most important variables have higher IncNodePurity value; therefore, this index was used to rank the input variables according to their importance in the estimation.

Winter wheat traits
Higher levels of N and water led to an increase in biomass and plant %N at the flowering stage, with more noticeable effects in 2018. Biomass showed a correlation with yield (R 2 = 0.48) and with N output (R 2 = 0.53), but not with GPC. On the other hand, plant %N was linked to GPC (R 2 = 0.69) and to N output (R 2 = 0.49), but not to yield.
Yield response to N fertilization was stronger in 2018 than in 2019. All N levels obtained more yield in 2018, and the differences between N levels were larger that year (Fig. 3a). Water levels did not affect yield for any N level and year (P ≥ 0.05). Yield increased with N application following a quadratic plateau model in both years. This fit resulted in significant differences between all N levels, except for N2 and N3 in 2018 (P ≤ 0.05). In fact, for 2018 the optimal N fertilizer rate was 213.7 kg N ha −1 , indicating that the plateau was not reached by the N3 plots (186 kg N ha −1 ). By contrast, for 2019, the yield showed differences in the control treatment (N0), reaching the plateau with a smaller N fertilization dose (127.8 kg N ha −1 ). Therefore, this finding showed that N2 (149 kg N ha −1 ) and N3 (199 kg N ha −1 ) plots were over-fertilized.
Contrary to yield, GPC was higher in 2019 than in 2018 for all N levels (P ≥ 0.05). Significant differences between water levels were only found in the N3 plots of 2018, with higher values in W2 (Fig. 3b). Therefore, the GPC of the W1 and W2 levels of 2018 were plotted separately, while the two water levels of 2019 were plotted together. The GPC increased linearly with N fertilization in 2019, and it fitted a quadratic model in 2018 for the two water levels. The different N fertilization rates produced significant differences between all N levels in 2018, as well as in 2019, except for the N-stressed plots (N0 and N1).

Fig. 3
Winter wheat a yield (kg ha −1 ), b grain protein concentration (%), and c N output (kg N ha −1 ) response curves to N availability (soil mineral + fertilizer) according to year (2018 and 2019). Variables were separated by water levels (W1 and W2) when significant differences were observed. Symbols are the mean value with standard errors as bars. Lines represent the adjusted model 1 3 The N output was higher in 2018 than in 2019 (Fig. 3c). Nevertheless, this difference between years was not found in the N-stressed plots (N0 and N1) (P ≥ 0.05). The effect of the N fertilization in N output was stronger in 2018 than in 2019, as the different N fertilization rates led to differences in N output between all N levels in 2018, whereas the N output of the well-fertilized plots (N2 and N3) was not different in 2019 (P ≥ 0.05). Consequently, the N output fitted a quadratic plateau model in 2019, with a maximum of 114 kg N ha −1 in N output, which was reached with 251 kg of available N per hectare. The effect of water levels in N output, as well as in GPC, was not apparent in 2019 and was only found in the N3 plots of 2018, with a higher value in W2. Therefore, the two water levels were plotted together in 2019 (Fig. 3c).

Spectral differences due to treatments and selection of indices
The effect of the N levels on the reflectance spectra acquired at flowering by the airborne sensors was detected on the visible, NIR, and SWIR regions, with the differences between N levels being more obvious in 2018 (Fig. 4). Low N levels had higher reflectance in the visible region, probably due to a lower photosynthetic pigment absorption, whereas high N levels increased reflectance in the NIR in both years. Within the SWIR region, reflectance in the 1500-1700-nm range was particularly sensitive in discriminating between N levels. This was attributed to the absorption feature of N-H bonds located in this region (Curran, 1989).
Differences in the spectra between water levels were more evident in 2019 and in the N-stressed plots of 2018, which were particularly detectable in the SWIR region (Fig. 4). In this region, the plots with less water availability presented higher reflectance. This pattern was also observed in the green and red wavelengths. The reflectance in the NIR region increased in the W2 plots of 2019.
The R 2 contour maps revealed the importance of using the adequate spectral region for an accurate estimation of each winter wheat trait (Fig. 5). Overall, yield was the wheat trait best estimated by the NDSIs, yielding a value of R 2 > 0.6 with most NDSIs that used an NIR or SWIR in combination with a visible band (especially green) or an NIR and SWIR band. The highest R 2 value (0.85) in all contour maps was obtained when estimating Fig. 4 Canopy spectra acquired with the hyperspectral aerial imagery in the two water levels (W1 and W2) of N0 and N3 fertilizer treatments at flowering each year yield with the NDSIs constructed with different combinations of bands within the red-edge and/or NIR regions. On the other hand, the best GPC estimation (R 2 = 0.72) was obtained with NDSIs based on reflectance at SWIR between 1600 and 1750 nm and visible region (green), followed by specific wavelengths in the NDSI (NIR, red-edge). The best N output estimation (R 2 = 0.73) was achieved by NDSIs based on bands in the NIR region (around 790 nm) and red-edge (around 750 nm), or two bands within the red-edge. Despite the similar maximum R 2 value obtained in estimating GPC and N output, N output showed a more robust correlation with other NDSIs, for example, most of the NDSIs (visible or rededge, NIR or SWIR) presented a value of R 2 > 0.5 with N output, while most of the NDSIs presented a value of R 2 < 0.3 with GPC.
The contour maps showed that NDSI (1650 nm, 550 nm) was highly correlated with GPC, while the yield estimation was more accurate when the 550-nm band was changed by shorter or longer wavelengths (Fig. 5). Overall, the contour map calculated for GPC showed similar patterns to the contour map calculated for plant %N, and the yield contour map was similar to the biomass contour map (Fig. 5).
The VIs used as structural, chlorophyll, and SWIR input variables in the ensemble models were selected according to their performance in the R 2 contour maps (Fig. 5) and the lack of correlation between them. To estimate yield, the VI selected as a proxy of chlorophyll content was NDSI (799 nm, 755 nm), and the SWIR-based index was NDSI (1106 nm, 1066 nm). They were selected because of their linear correlation with yield (R 2 = 0.76 in both indices) and because there was no collinearity between them (Pearson coefficient ≤ 0.75). Due to the sensitivity of the red-edge reflectance to Fig. 5 Contour maps representing the R 2 from the linear relationship of wheat traits (yield (kg ha −1 ), grain protein concentration (%), and N output (kg N ha −1 )) and crop parameters at flowering (biomass and plant %N) against all possible normalized difference spectral indices (NDSIs) calculated with the hyperspectral airborne imagery acquired at flowering each year. The regions not covered by the sensors (850-950 nm) and the water absorption wavelengths are in white chlorophyll content (Inoue et al., 2016), different studies reported the good performance of NDSI (NIR, red-edge) for estimating chlorophyll (Fitzgerald et al., 2006;Zillman et al., 2015) or N content (Inoue et al., 2012;Li et al., 2013). They are crucial component involved in photosynthesis, and therefore their content affects biomass production (Fig. 5), and final yield (Quemada & Gabriel, 2016). This explains why this study, in agreement with previous research (Adak et al., 2021;Raya-Sereno et al., 2021b;Wang et al., 2019), obtained a good correlation between NDSI (NIR, red-edge) and wheat yield. The correlation between NDSI (1106 nm, 1066 nm) and yield can be explained because nearby wavelengths are the absorption feature of the structural biochemical components of plants, such as lignin (1120 nm), and the N-H absorption wavelength located at 1020 nm (Curran, 1989). Cell structure is affected by the nutritional and water status, which has an effect on plant growth, and therefore these wavelengths are related to yield (Thenkabail et al., 2013). We did not find any structural index that presented no collinearity with the chlorophyll and SWIR selected indices. This study agrees with previously developed contour maps showing that NDSI (NIR, green) was the most suitable structural index for biomass (Hansen & Schjoerring, 2003) and yield estimation (Raya-Sereno et al., 2021b). For these reasons, the structural index used in the ensemble models for estimating yield was NDSI (800 nm, 550 nm), which corresponds to GNDVI (Gitelson et al., 1996). The chlorophyll strongly absorbs light in the visible region, especially in the blue and red bands (Sims & Gamon., 2002), and thus GNDVI has a lower value than NDVI (NDSI (NIR, red)) and tends to saturate later (Rodrigues et al., 2018).
To estimate GPC, the proxy of chlorophyll content was NDSI (795 nm, 750 nm) because it has one of the highest accuracies in GPC estimation (R 2 = 0.70; Fig. 5). This NDSI belongs to the small region of the GPC contour map based on NIR and red-edge with a high R 2 value. A relationship between NDSI (NIR, red-edge) and GPC was also reported by Raya-Sereno et al. (2021b) and Fu et al. (2022). This NDSI presented a Pearson coefficient of ≤ 0.75 with NDSI (1650 nm, 545 nm), which is one of the SWIR indices most closely correlated with GPC (R 2 = 0.64); therefore, it was selected as the SWIR-based NDSI used to estimate GPC. The performance of the SWIR index for GPC estimation relies on the protein feature band near this region (Curran, 1989). Similarly, Söderström et al. (2010) successfully used the simple ratio of SWIR (1550-1750 nm range) and green reflectance for GPC estimation in barley, and Zhao et al. (2005) reported the suitability of the same SWIR region for GPC estimation in wheat. All structural NDSIs presented a weak correlation with GPC (R 2 < 0.1), but the correlation with NDSI (NIR, green) was slightly higher (P < 0.05). For this reason, due to the correlation with biomass at flowering and to the lack of collinearity with the other GPC estimators, GNDVI was selected as the proxy of plant structure to estimate GPC with the ensemble models.
One of the NDSIs that exhibited the best correlation with N output was NDSI (778 nm, 752 nm) (R 2 = 0.74); therefore, it was used as the chlorophyll index in the N output estimation models. Likewise, Prey and Schmidhalter (2019b) used NDSI (770 nm, 750 nm) for N output estimation. The suitability of this NDSI to assess winter wheat N uptake at the flowering stage was highlighted in the contour maps developed by Li et al. (2013). This NDSI presented no collinearity with an SWIR-based NDSI that was correlated with N output: NDSI (1650 nm, 520 nm) (R 2 = 0.62). The chlorophyll and the SWIR VIs selected for estimating N output were correlated with all structural NDSI. The GNDVI was selected as the structural index for estimating N output with the ensemble models because it presented a value of R 2 = 0.65 with N output and was correlated with plant biomass at flowering.

Assessment of wheat trait estimation with hyperspectral airborne imagery using ensemble models
The performance of estimating wheat traits with a single NDSI was improved when combining different indices with the ensemble models (Fig. 6). In this study, the three ensemble models performed similarly when using three or more indices to estimate any of the wheat traits. However, significant differences between the accuracy of the models were observed when using only two indices, which resulted in a lower accuracy of the RF model. Overall, the RF showed an improvement in estimation when more indices were used. The good performance of the MLR occurred because a linear regression model (contour maps) was used to select the explanatory variables (NDSIs) and, therefore, a linear relationship between the response and the explanatory variables exists (Sellam & Poovammal, 2016).
Yield was the wheat trait best estimated when using ensemble models (Fig. 6), as observed in the R 2 contour maps (Fig. 5). The most accurate yield estimation was obtained when combining the structural (GNDVI), chlorophyll (NDSI (799 nm, 755 nm)) and SWIR (NDSI (1106 nm, 1066 nm)) indices with ANN (R 2 = 0.86; RMSE = 493.17 kg ha −1 . Fig. 6 Coefficient of determination (R 2 ) and root mean square error (RMSE) obtained when using airborne sensors to estimate wheat traits: a yield (kg ha −1 ), b grain protein concentration, and c N output with linear regression (LR), multiple linear regression (MLR), artificial neural network (ANN), and random forest (RF). Indices used are on the x axis: spectral vegetation indices based on visible-near infrared regions related to chlorophyll content (chl) and plant structure (stru), a vegetation index that includes a band within the SWIR region (SWIR), solar-induced fluorescence (SIF) and water deficit index (WDI). Different white letters indicate significant differences (p ≥ 0.05) between ensemble models with the same set of indices. Colored letters indicate differences between the same ensemble models using a different set of indices Figure 6a). In this case, the most important estimator according to the IncNodePurity was the SWIR index. When the SIF was included in the analysis, it obtained the highest or the second highest IncNodePurity value (Fig. 7), and was correlated with yield (R 2 = 0.73; data not shown), but no improvement was achieved when included in the models. When using only NDSI (773 nm, 753 nm), which is the best NDSI from the contour maps, to estimate yield with tenfold cross-validation, values of R 2 = 0.84 and RMSE = 521.18 kg ha −1 were obtained. This result indicates that combining different NDSIs with ANN improved the yield estimation.
The maximum R 2 value obtained when estimating GPC with the NDSIs or with the ensemble models was the lowest among all wheat traits analyzed, indicating that it is the most challenging trait to estimate. The most accurate GPC estimation was obtained with MLR using the chlorophyll (NDSI (795 nm, 750 nm)), structural (GNDVI), and SWIR (NDSI (1650 nm, 545 nm)) indices (R 2 = 0.73; RMSE = 0.19%N; Fig. 6b); this was the only model that outperformed the tenfold cross-validation results obtained with NDSI (1701 nm, 551 nm) (R 2 = 0.72; RMSE = 0.20%N). When using only VNIR-based NDSIs, the highest R 2 value was 0.68 and the lowest RMSE was 0.21%, which indicates that GPC estimation is the one that improved the most when including SWIR reflectance. According to the IncNodePurity, the most important indices for GPC estimation were chlorophyll and SWIR, which showed important differences with the other indices in IncNodePurity value in all cases. No correlation between GPC and the SIF was found, and there was no improvement when the SIF was included in the GPC estimation models.
The most accurate estimation of N output was obtained with MLR using the chlorophyll (NDSI (778 nm, 752 nm)), structural (GNDVI), and SWIR (NDSI (1650 nm, 520 nm)) indices (R 2 = 0.74; RMSE = 15.47 kg N ha −1 ; Fig. 6c). The IncNodePurity indicated that the most important indices in the estimation were chlorophyll and SWIR; however, the difference with the structural index was smaller than in the GPC estimation. When including only VNIR-based VIs in the ensemble models, the best performance was obtained with the MLR with values of R 2 = 0.72 and RMSE = 16.2 kg N ha −1 . Despite a correlation between SIF and N output being found (R 2 = 0.27; p < 0.001; data not shown), no improvement in the estimation was attained when the SIF was included in the models. Pancorbo et al. (2021b) showed in the same study site that the WDI was able to distinguish between water levels with minimum effect on the N levels; however, the WDI did not enhance any trait estimation despite being correlated with yield (R 2 = 0.44; p < 0.001) and with N output (R 2 = 0.18; p < 0.001), but not with GPC (R 2 < 0.1; p > 0.1).

Convolved validation and assessment of wheat trait estimation with Sentinel-1 and Sentinel-2 using ensemble models
High similarities were found between the NDSIs extracted from the Sentinel-2 imagery and the NDSIs calculated with the convolved bands (R 2 > 0.71, n = 118 pixels; Supplementary Material S.2). These strong relationships justify using the convolved indices (Table 2) to test the estimation capacity of Sentinel-2.
The accuracy was similar when yield was estimated from aircraft imagery or Sentinel-2 band-derived NDSIs (Figs. 6a and 7a). The best yield estimation when using the Sentinel dataset was obtained with the ANN model using the structural (GNDVI), chlorophyll (NDSI (B8,B6)), and SWIR (NDSI (B11,B8)) indices (R 2 = 0.85; RMSE = 507.08 kg ha −1 ; Table 2; Fig. 8a). The same model and variables produced the best estimation when indices were calculated with the hyperspectral airborne imagery, but with a slightly more accurate estimation (R 2 = 0.86; RMSE = 493.17 kg ha −1 ; Fig. 6a). According to the Inc-NodePurity, the most important indices to estimate yield with Sentinel were the chlorophyll and the SWIR indices in all cases (Fig. 7). With the aircraft imagery the SWIR index also reached the highest IncNodePurity value in most cases. When using only the VNIR Sentinel-2 bands, the combination of the structural and chlorophyll indices Fig. 7 Importance of the variables according to the increase in node purity (IncNodePurity) when estimating yield, grain protein concentration, and N output with the airborne hyperspectral and Sentinel imagery. Chl and Stru are spectral vegetation indices based on visible-near infrared regions related to chlorophyll content and plant structure, respectively. SWIR indicates a vegetation index that includes a band within the SWIR region. SIF and WDI stand for solar-induced fluorescence and water deficit index, respectively. S1 indicates the radar vegetation index calculated with Sentinel-1 images (R 2 = 0.84; RMSE = 519.67 kg ha −1 ) outperformed the index with the highest R 2 value in the contour maps but calculated with the Sentinel-2 bands (NDSI (B7, B6) (R 2 = 0.80; RMSE = 582.13 kg ha −1 ), highlighting the importance of combining different indices with ensemble models. The best estimation of GPC with the Sentinel dataset was obtained with the MLR model using the structural (GNDVI), chlorophyll (NDSI (B8, B6)), and SWIR (NDSI (B11, B3)) indices (R 2 = 0.69, RMSE = 0.19%N; Table 2; Fig. 8b). The same indices gave the best result with the aircraft imagery (Fig. 6b). The GPC was the wheat trait that presented the highest estimation improvement when the SWIR bands were included in the model, compared to using only the VNIR bands (R 2 < 0.15, RMSE > 0.34%N). According to the Inc-NodePurity, the most important estimator was the SWIR index, showing an important difference with the other indices (Fig. 7). The performance of using only the SWIR index was tested (R 2 = 0.65, RMSE = 0.20%N), but a better result was achieved when it was combined with the VNIR indices.
The most accurate estimation of N output with the Sentinel dataset was obtained with the MLR model using the structural (GNDVI), chlorophyll (NDSI (B7, B8)), and SWIR Fig. 8 Coefficient of determination (R 2 ) and root mean square error (RMSE) obtained when using Sentinel imagery to estimate wheat traits: a yield (kg ha −1 ), b grain protein concentration, and c N output with linear regression (LR), multiple linear regression (MLR), artificial neural network (ANN), and random forest (RF). Indices used are on the x axis: spectral vegetation indices based on visible-near infrared regions related to chlorophyll content (chl) and plant structure (stru), vegetation index that includes a band within the SWIR region (SWIR), and radar vegetation index (RVI). Different white letters indicate significant differences (p ≥ 0.05) between ensemble models with the same set of indices. Colored letters indicate differences between the same ensemble models using a different set of indices 1 3 (NDSI (B11, B2)) indices (R 2 = 0.71; RMSE = 16.46 kg N ha −1 ; Table 2; Fig. 8c), as obtained with the aircraft imagery when estimating N output (Fig. 6c). The Sentinel results are in agreement with the aircraft imagery showing that N output estimation is more accurate than GPC estimation but less than yield estimation. The structural index obtained the highest IncNodePurity value in all cases, but it was similar to the value of the chlorophyll index (Fig. 7). These indices also obtained the highest IncNodePurity value in all models with the aircraft imagery. The N output estimation was more accurate when using only one index based on the red-edge bands (NDSI (B7, B6)) than when combining the chlorophyll and the structural indices (R 2 = 0.59; RMSE = 19.47 kg N ha −1 ), but including SWIR bands for N output estimation when using Sentinel information, improved the performance compared with the best result obtained using only VNIR bands (R 2 = 0.62; RMSE = 18.95 kg N ha −1 ).
Differences in RVI between water levels were found in 2018 (P < 0.05; data not shown); however, no improvement was achieved when it was included in the analysis of any trait estimation.

General discussion
The present study analyzed whether the estimation of different winter wheat traits improved by combining indices related to different crop biophysical parameters. In all cases, an improvement was achieved when combining different indices rather than using one index only. Overall, the results indicated that a visible-SWIR sensor was the most suitable for all winter wheat traits; however, the spectral resolution and range was an important factor when estimating some traits.
Estimating yield based on hyperspectral VSWIR reduced RMSE by 3% compared with the Sentinel-2 estimation. If the VNIR region only was used, the RMSE difference between hyper-and multispectral sensor estimation was < 0.3%. Due to this small reduction in RMSE, the adequate spatial and temporal coverage, and its free availability, Sentinel-2 imagery is suitable for accurately estimating yield in large areas; moreover, including the SWIR bands reduced uncertainty. The most important index for yield estimation was the SWIR index (NDSI (1106 nm, 1066 nm)), which is affected by lignin content and therefore by biomass, that is related to final yield (Marti et al., 2007). When SIF was included in the analysis, it obtained the highest importance; this can be explained by the link between SIF and photosynthesis rate, which is affected by N (Camino et al., 2018) and water availability (Zarco-Tejada et al., 2012). Other studies also reported the utility of Sentinel-2 for wheat yield estimation using different techniques: Skakun et al. (2017) used Sentinel-2 and Landsat-8 time series for the peak-NDVI approach and obtained an RMSE of 310 kg ha −1 . Mehdaoui and Anane (2020) reduced the RMSE to 380 kg·ha −1 using the red-edge bands. Segarra et al. (2022) combined multidate Sentinel-2 information with ensemble models to achieve an RMSE of 740 kg ha −1 . Cavalaris et al. (2021) used EVI and NMDI for durum wheat yield estimation and obtained an RMSE of 538 kg ha −1 . Hunt et al. (2019) reduced the RMSE in winter wheat yield estimation from 660 kg ha −1 when using only Sentinel-2 information to 610 kg ha −1 when it was combined with environmental data. For this reason, our study encourages further research to include environmental data when aiming to estimate crop traits.
The GPC estimation was less accurate than the estimation of the other traits, and its accuracy depended greatly on the spectral region and resolution used. The Sentinel-2 VNIR bands were not suitable for GPC estimation in this study, as the RMSE was 39% higher than the RMSE obtained with the hyperspectral VNIR or with the VSWIR Sentinel-2 band. The difference in RMSE between hyper-and multispectral VSWIR sensors was 8.1%, the same as between a hyperspectral VNIR and VSWIR sensor. Therefore, it is recommended to use a hyperspectral VSWIR sensor for GPC estimation. Raya-Sereno et al. (2021b) also reported that broad bands were reliable for yield prediction; however, an accurate GPC and N output estimation required narrow bands. The need for a SWIR narrow band is attributed to the N-H bond absorption feature in the SWIR region (Curran, 1989), and because the N stored in vegetative organs is an important source for final GPC (Kichey et al., 2007). In addition, different factors affect reflectance in the SWIR region that can mask the N influence in this region (Yan et al., 2021). A review study (Bastos et al., 2021) indicated that VIs based on the absorption and reflectance peak of chlorophyll (blue and green, respectively) are commonly used to estimate GPC near anthesis (GS60) because most leaf N is contained in chlorophyll (Wang et al., 2004). In the current study, a relationship (R 2 ~ 0.5) between NDSI (green, blue) with GPC and with plant %N was obtained. Zhao et al. (2019) applied multilinear regression using crop parameters together with Sentinel-2 information and obtained a maximum value of R 2 = 0.47 when estimating GPC.
The most accurate N output estimation was achieved by the hyperspectral VSWIR sensor; however, the differences in RMSE with the hyperspectral VNIR was only 3.1%. The RMSE obtained with the hyperspectral VNIR sensor was 15.7% lower than with the multispectral VNIR Sentinel-2, but this difference was reduced to 2.7% if the multispectral SWIR bands were included. Therefore, if a hyperspectral VSWIR sensor is not available, similar accuracies in predicting N output can be achieved with a hyperspectral VNIR. Despite the fact that the Sentinel-2 (R 2 = 0.71) and the hyperspectral (R 2 = 0.74) VSWIR bands showed potential for N output estimation, the hyperspectral sensor reduced the RMSE by 6%. The hyperspectral sensor was found to be important because the most important index in the N output estimation was constructed with two bands in the rededge, which are difficult to adapt to multispectral sensors. Similar results were obtained by Prey and Schmidhalter (2019a), who highlighted the importance of the Sentinel-2 red-edge bands for the prediction of winter wheat N-related traits.

Conclusion
This study highlights the importance of using multispectral and hyperspectral sensors covering different spectral regions for assessing winter wheat biophysical parameters and traits such as yield, grain protein concentration (GPC), and N output. In all cases, the best trait estimation was attained when combining spectral indices calculated by using bands from the visible, red-edge, NIR, and SWIR regions. Of the three wheat traits evaluated, yield obtained the most accurate estimation, and presented similar results when the indices were retrieved with a hyperspectral sensor or with multispectral Sentinel-2 bands. In the case of GPC, both sensors obtained satisfactory results when the SWIR information was included. However, an improvement of 8.1% was obtained with the hyperspectral sensor. The rededge and SWIR-based indices were important for improving N output estimation with both sensors. Despite a more accurate estimation being attained with the hyperspectral bands, the potential of using Sentinel-2 bands for wheat trait assessment at large scales was great.