Estimation of canopy nitrogen content in winter wheat from Sentinel-2 images for operational agricultural monitoring

Canopy nitrogen content (CNC, kg/ha) provides crucial information for site-specific crop fertilization and the usability of Sentinel-2 (S2) satellite data for CNC monitoring at high fertilization levels in managed agricultural fields is still underexplored. Winter wheat samples were collected in France and Belgium in 2017 (n = 126) and 2018 (n = 18), analysed for CNC and S2-spectra were extracted at the sample locations. A comparison of three established remote sensing methods to retrieve CNC was carried out: (1) look-up-table (LUT) inversion of the canopy reflectance model PROSAIL, (2) Partial Least Square Regression (PLSR) and (3) nitrogen-sensitive vegetation indices (VI). The spatial and temporal model transferability to new data was rigorously assessed. The PROSAIL-LUT approach predicted CNC with a root mean squared error of 33.9 kg/ha on the 2017 dataset and a slightly larger value of 36.8 kg/ha on the 2018 dataset. Contrary, PLSR showed an error of 27.9 kg N/ha (R2 = 0.52) in the calibration dataset (2017) but a substantially larger error of 38.4 kg N/ha on the independent dataset (2018). VIs revealed calibration errors were slightly larger than the PLSR results but showed much higher validation errors for the independent dataset (> 50 kg/ha). The PROSAIL inversion was more stable and robust than the PLSR and VI methods when applied to new data. The obtained CNC maps may support farmers in adapting their fertilization management according to the actual crop nitrogen status.


Introduction
Nitrogen (N) fertilization is of high importance in crop production since it is directly related to the photosynthetic process (Andrews et al., 2013). Furthermore, N is a critical factor for crop growth and consequently yield (Erisman et al., 2008). An excess application of N may lead to N leaching and increased nitrate concentrations in the surrounding water bodies and the groundwater (Cameron et al., 2013), which has major negative impacts on the environment, including on human health.
In intensive wheat cropping systems, N fertilizer application is split up into three fractions to minimize the nitrogen loss and to optimize the N uptake by the crops. The estimation of canopy nitrogen content (CNC, kg/ha) during stages 30-39 (stem elongation) provides crucial information that may allow for the adjustment of fertilization to the crops' needs (Delloye et al., 2018). The relevant CNC information for variable rate N-fertilization can be derived from optical reflectance data. With the launch of the multispectral Sentinel-2 satellite constellation (S-2), which is part of the European Commission's Copernicus programme, an observation of agricultural areas is possible every 5 days under cloud-free conditions. The spatial resolution of 10-20 m and repetition time seem suitable for monitoring CNC at the sub-field scale during the growing season. However, there is a need to assess the retrieval of CNC from operational S-2 data over conventionally managed agricultural fields following common agricultural practices, as opposed to many studies that use fertilizer trials with artificially increased fertilizer rates that result in large CNC variation.
In general, remote sensing-based methods for the retrieval of CNC or related biophysical attributes can be divided into four major approaches: (1) Parametric regression or vegetation indices (VI), (2) empirical models including multivariate statistical (MS) and machine learning (ML) methods, (3) physically based radiative transfer models (RTM), and (4) hybrid methods, a combination of RTM and ML (Berger et al., 2020a;Verrelst et al., 2019). Wheat CNC was retrieved using different combinations of sensors (field spectrometer to satellite) and methods. Among the available body of literature, some papers looked at CNC retrieval in managed wheat fields from satellite data: Using RapidEye data and the normalized difference red-edge index (NDRE), Magney et al. (2017) achieved a root mean squared error (RMSE) of about 16 kg N/ha of CNC on conventional managed fields over 3 years (2012-2014) but did not transfer the model to other study sites or different years. Söderström et al. (2017) used MSAVI-2 based on the DMC satellite constellation (www. dmcii. com) and NDRE based on the S-2 and obtained accuracies of 11-15 kg N/ha on winter wheat fields in central and south-west Sweden under consideration of the variety. MS and ML methods were mostly applied in studies that are based on hyperspectral (HS) measurements acquired from field spectrometers, but increasingly also from drone-based HS sensors (Abdelbaki et al., 2021). Thorp et al. (2017) used a genetic algorithm and partial least square regression (PLSR), resulting in a cross-validated RMSE of about 16%. Hansen and Schjoerring (2003) compared RMSE values obtained using PLSR (7.8 kg N/ ha) and RMSE using all possible normalized difference VIs (8.3 kg N/ha). Due to their empirical nature, VIs, MS and ML methods are considered to have limited transferability in space and time and are specific to the sensors on which they were calibrated.
Studies using retrieval methods based on RTMs, e.g. PROSAIL (Jacquemoud et al., 2006), mostly focus on the estimation of the model variables leaf area index (LAI), leaf chlorophyll content (Cab), or canopy chlorophyll content (CCC) (e.g. Botha et al., 2010;Croft et al. 2020;Darvishzadeh et al., 2008;Si et al., 2012;Verrelst et al., 2014Verrelst et al., , 2015aVerrelst et al., , 2015bDanner et al., 2017). Most of the studies using a PROSAIL inversion scheme end with the prediction of LAI, Cab or CCC. From LAI or CCC, CNC can be obtained using an empirical relationship that first needs to be established based on a sample of field and lab measurements. One of the few studies that linked a PROSAIL-retrieved variable empirically with CNC is Delloye et al. (2018); the authors used an artificial neural network (ANN) in the inversion procedure to retrieve the CCC from winter wheat on managed fields and linked it with the CNC, thereby achieving accuracies of approximately 11-36 kg N/ha depending on the growth stage.
N mapping is often realised via the significant link between Cab content and N (Baret et al., 2007;Croft et al., 2020). Even if the Cab pigments represent a rather small portion of N (approximately 2%), for certain crops and growth stages good Cab-N correlations are present. In general, Cab can be retrieved through RTM inversions with a good accuracy (Delloye et al., 2018). For instance, Croft et al. (2020) demonstrated the use of Landsat-8 imagery for the retrieval of CCC utilizing also a PROSAIL inversion procedure with an RMSE of 49.6 µg/cm 2 and an R2 of 0.87 and mentioned the usability of CCC for quantifying the crop N status due to the strong relationship of Cab and N (Sage et al., 1987).
Nevertheless, the relationship gets lost after the mature growth stage when leave N is translocated to other plant organs like grains (Berger et al., 2020a(Berger et al., , 2020b. Contrary, N fertilisation and CNC assessment are relevant mostly at early crop development stages and thus, weakening of Cab-N relations at mature stages is less of problem. PROSAIL has been further developed to allow for studies on Yang et al. (2015) replaced the Cab absorption coefficients in PROSPECT with a N-specific absorption coefficient to directly simulate a N-specific response (N-PROSPECT) in winter wheat. This offers the opportunity to retrieve the leaf N density in an inversion of the N-PROSPECT model. Wang et al. (2015) studied the accuracy of PROSPECT-5 in the SWIR domain based on leaf-optical properties from 50 species to derive protein content on leaf-level. Li et al. (2018) extended the N-PROSPECT by coupling it with the SAIL model to create N-PRO-SAIL and retrieved an RMSE of 9.5 kg N/ha in its estimation of the CNC of winter wheat, using hyperspectral data recorded with an ASD FieldSpec spectrometer. Another development in this direction is PROSAIL-PRO (Berger et al., 2020b), which includes newly calibrated specific absorption coefficients of proteins. PROSAIL-PRO requires high resolution hyperspectral full range measurements including the SWIR domain. Recently, this promising approach has been used to estimate CNC is via the protein absorption features located in the SWIR domain using hybrid RTM inversion  with promising results (RMSE = 21-23 kg N/ha). Such approaches will be applicable in agricultural N monitoring once hyperspectral satellite missions like CHIME or SBG become operational.
Contrary, this study intends to use operational S-2 data for continuous monitoring of crop N status which is unsuitable for N-PROSAIL or PROSAIL-PRO applications. Instead, it is assumed that there is a relationship between CNC and LAI or CCC, based on the assumption that spatial variations in CNC are mostly caused by variations in biomass and less by variations in leaf N concentration, which are often at elevated levels in managed agricultural fields.
The hypothesis of this study is that a CNC retrieval method based on an RTM is more reliable by transferring the model in space and time compared to widely utilized models based on statistical approaches (PLSR) or N-sensitive VI's. All of these methods are highly established and widely used in remote sensing science. Most studies often use a soft validation scheme, e.g. leave-one-out cross validation, and thus lack transferability to other study sites and growing seasons (e.g. Magney et al., 2017;Thorp et al., 2017). As PLSR 1 3 and VI's are mainly data driven it is assumed that the RTM based method will show more reliable error margins by using different validation schemes. Therefore, following objectives were addressed: (1) The assessment of CNC retrieval accuracy from S-2 derived imagery using three established methods usable also for operational services: empirical (N-sensitive VIs), statistical (PLSR) and physically based (PROSAIL canopy reflectance model) and assess the feasibility of the approaches for operational use with reliable error margins.
(2) Evaluate two validation schemes to assess the influence of the validation procedure on the accuracy assessment of the three different remote sensing based CNC retrieval methods.
(3) Based on the PROSAIL retrieved variables there were three strategies identified to link those variables to field-measured CNC: (i) relate LAI to CNC, (ii) compute LAI*Cab = CCC, and link CCC to CNC, iii) relate Cab to lab-measured leaf N concentration (LNC) and multiply LNC with biomass to retrieve CNC. The last objective addresses the assessment of the best retrieval strategy based on PROSAIL retrieved variables.

Study area
In the study area in northern France and southern Belgium, a total of eleven study sites were identified that were probed either in 2017 or in 2018 (Fig. 1). Climatically, the study area belongs to the Cfb class (temperate oceanic climate) of the Köppen-Geiger climatic classification system. From south to north within the study area, the average temperature Fig. 1 Overview of the 2017 and 2018 field sites in France and Belgium. In 2017 the field sites were selected along a north to south gradient, and in 2018, on an east-west orientation drops from 10.5 °C (Troyes, France) to 9.8 °C Namur (Belgium) and the average precipitation rises from 650 to 819 mm. During the field campaigns in France, a drought occurred in April until mid-May 2017, which affected plant growth.
From west to east, the average climatic condition changed from 10.7 °C and 688 mm precipitation per year (Le Havre) to 9.7 °C and 731 mm rainfall (Nancy). In 2018, no significant weather anomalies were noticed during the field campaign.

Field and laboratory work
On each of the eleven sites, one to four agricultural fields were selected and, in each field, two to four plots were established (Table 1). In total, 204 samples were taken in the field in April to May 2017, and 21 samples in May 2018 (Table 1). Of those 204 samples in 2017, 11 samples had to be discarded due to inappropriate cooling prior to N analysis, resulting in a final number of 193 samples for 2017. From the 2018 sampling set, two samples were withdrawn from further analysis due to mouldy sample material. Finally, 212 samples were analysed for CNC. For this dataset the statistical relationship between field observations (e.g. LAI) and laboratory measurements was carried out (n = 193), while just for the samples with available cloud-free S-2 imagery (n = 126) the modelling part of this study was conducted. Table 2 gives an overview on the dates of the field campaigns and the BBCH stages.
Each plot was 40 × 40 cm 2 and established along transects with an approximate distance of 100 m between neighbouring plots to best cover the variation within a field. Within each plot, Cab content, LAI and canopy reflectance were measured. Afterwards the plots were harvested, and the fresh biomass was recorded. In the lab, dry matter and N content were determined with a CNHS analysis (LECO TruSpec Elemental Determinator). CNC was calculated as The values obtained were converted to units of (kg/ha). Cab content was estimated indirectly using a Minolta SPAD-502. Five leaves per plot were sampled and on each leaf five measurements were taken, so each sampling area was represented by 25 measurements. The leaves were chosen randomly within the sampling area.
SPAD values were transferred into Cab values using the formula described for wheat cultivars by Uddling et al. (2007): A Licor LAI 2000 was used to measure LAI. Five below canopy and one above canopy measurements were taken during overcast weather conditions or during sunshine using artificial shading with an umbrella covering the sensor optic from direct illumination. During the 2018 campaign, only biomass and CNC were measured.

Sentinel-2 data and acquisition
Image data of S-2 were acquired with the Copernicus Open Access Hub (https:// scihub. coper nicus. eu/ dhus/#/ home). In spring 2017, only S-2A images were available, with a revisit time of 10 days. In total, for 126 plots S-2 data were available with mostly less than 1 week time distance to the field campaigns under cloud-free conditions in 2017. In 2018, 18 plots were covered by S-2 in this way, leading to a total of 144 plots with corresponding S-2 image spectra (Tables 1, 3). The pre-processing of the images included the resampling of all S-2 bands to 10-m pixel resolution and atmospheric correction from L1C level to L2A level with the Sen2Cor module of the ESA SNAP toolbox.

Retrieval methods
A selection of nine VIs that were described in the literature as sensitive to N were analysed (Table 3): Red Edge Inflection Point (REP), Normalized Difference Red-Edge (NDRE and NDRE2), Modified Chlorophyll Absorption in Reflectance Index (MCARI), MERIS Terrestrial Chlorophyll Index (MTCI), Enhanced Vegetation Index (EVI2), Chlorophyll Index red-edge (CI red-edge ), Chlorophyll Index green (CI green ) and, because of its widespread use in measuring greenness, the NDVI (Normalized Difference Vegetation Index). The formulation and literature sources are given in Table 4. PLSR is an advanced multiple linear regression method described by Wold et al. (2001). The S-2 spectral information is used as independent variables to predict the CNC through a dimensionality reduction of the collinear spectral input features (10 S-2 bands) to a few non-correlated latent variables by maximizing the variance between the spectral information of the S-2 bands and CNC.
PROSAIL (Jacquemoud et al., 2006) is a combination of the SAIL canopy reflectance model and the PROSPECT leaf optical properties model, in this study the PROSAIL 5B Clevers and Gitelson (2012), Guyot et al. (1988) Merzlyak (1994, Sims andGamon (2002) Clevers and Gitelson (2012) MCARI Clevers and Kooistra (2012) Rouse et al. (1973) model implemented in Matlab (http:// teled etect ion. ipgp. jussi eu. fr/ prosa il/) was deployed PROSPECT-5 (Feret et al., 2008;Verhoef et al., 2007) calculates the leaf's hemispherical reflectance and transmittance as function of six input parameters, as described in Table 5. The PROSPECT calculations are inputs to the SAIL model (Jacquemoud et al., 2009). SAIL is a one-dimensional turbid medium RTM, which describes the canopy as a horizontally homogeneous and semi-infinite layer that consists of small vegetation elements acting as absorbing and scattering particles of a given geometry and density (Darvishzadeh et al., 2011). This means that the model performs satisfactorily when simulating homogeneous vegetation canopies. PROSAIL inversion strategies can be based on numerical optimization methods, hybrid methods including ML algorithms (e.g. artificial neural networks) or radiometric driven methods like look-up table (LUT) (Danner et al., 2017;Kimes et al., 2000;Verrelst et al., 2019). A LUT stores the input variable values of the PROSAIL parametrization and the simulated spectra. LUTs permit a global search, which avoids local minima while showing a less unexpected behaviour when the S-2 spectra is not well represented by PROSAIL simulations (Darvishzadeh et al., 2008). To find a solution to the inversion, the LUT was sorted according to a cost function (Combal et al., 2003;Verrelst et al., 2015aVerrelst et al., , 2015bVohland et al., 2010).
Most of the variable ranges, respectively fixed parameters, to generate the LUT were estimated using the knowledge gathered in field and laboratory work, as well as being based on literature and S-2 metadata (e.g. solar zenith angles). In this way, a suitable LUT was established, large enough to find a solution with high accuracy for the inversion problem (Combal et al., 2003;Darvishzadeh et al., 2008;Weiss et al., 2000). The leaf angles were simulated using the leaf inclination distribution function (LIDF). The two parameters LIDFa and LIDFb were determined iteratively in the forward mode of PROSAIL until the simulated spectra encompassed the observed range of S-2 reflectance. An adjusted spherical distribution was found to be the best distribution to generate simulations which represent best the S2 spectra of the wheat plots.
To find a solution for the inversion problem, a cost function had to be defined, which gave a measure for the similarity of an observed S-2 spectrum (R obs ) and a simulated spectrum (R sim ) of ProSAIL. In this study, RMSE was used as cost function: where RMSE LUT,k is the RMSE of LUT entry k, R obs,i is the reflectance of S-2 band i, R sim,i is the simulated reflectance of LUT entry k at band i and n is the number of spectral bands.
The LUT was resampled to the spectral resolution of S-2 using the Spectral Response Functions published by ESA. S-2 band 2 (490 nm) and band 8 (842 nm) was excluded in the inversion, as they were not well represented in the LUT modelling. RedEdge and SWIR bands stabilized the inversion results, excluding them led to a decrease in the model accuracy.
The retrievals of LAI and Cab were achieved by searching for the position of the LUT entry with minimum RMSE LUT and picking the corresponding input variable value of this simulation. In several studies, authors achieved better results by using n-best solutions instead of considering the single best (e.g. Combal et al., 2003;Darvishzadeh et al., 2011). Consequently, a stable solution for the inversion problem was established by testing different sizes of n. Using a selection of the 100 LUT-entries with minimum RMSE LUT led to the most stable solutions with higher accuracy of R2 and RMSE. Averaging these corresponding 100 LAI and Cab input values gave the solution for the inversion problem, i.e. the retrieved variables. These retrievals represented the input parameter for the empirical models obtaining biomass and CNC (Fig. 2).

CNC retrieval models
The estimation of CNC with PROSAIL LAI and Cab retrievals requires a transfer function between LAI to biomass and CNC or between CCC and CNC. Hereby, the CCC is calculated by multiplying LAI with Cab. Three different methods for estimating CNC were established using the field and laboratory measurements of LAI, Cab (respectively CCC), biomass and LNC: (1) Directly link LAI and CNC (CNC-LAI) (2) Calculate CCC (LAI * Cab) and link it to CNC (CNC-CCC) (3) Model the relationship of LAI to dry biomass (DM) and the relationship between DM and CNC. (CNC-DM)

Calibration and validation schemes
The first strategy (scheme A) used the 2017 (n = 126) sampling set for the calibration of the PLSR and the VI-based empirical models, with the internal calibration using a leave-one out cross-validation (LOO-CV). Those models were validated with the independent sampling set of 2018 (n = 18) and the RMSE, RRMSE and R 2 are reported. Within the second strategy (scheme B), the 2017 and 2018 datasets were pooled to one dataset and randomly partitioned into 70% for calibration and 30% for validation. To avoid biased results due to an inconvenient sample choice of the calibration and validation Fig. 2 Overview of the different strategies to retrieve CNC. On the left side, the use of the field and laboratory data for model calibration is shown. The right side displays the application of the derived PROSAIL-LAI/Cab retrievals as the input for the empirical models to derive CNC datasets, this was repeated 50 times in a bagging procedure and the mean of RMSE, RRMSE and R 2 of the 50 iterations were reported.
In contradistinction to the PLSR and VI-based models, the PROSAIL inversion scheme does not require a calibration dataset. However, the PROSAIL LUT was generated with the basic knowledge of the field campaigns, laboratory measurements and the S-2 ensembles gathered for the plots.
Empirical linear relationships were only necessary in the second step of the PROSAIL retrieval chain. Those empirical models were set up with the relationships between the field data and the laboratory data of the 2017 field campaign. The models were generated in a repeated tenfold cross-validation in R (R Core Team, 2021).
The comparison of the three models (PLSR, VI-based models and PROSAIL) was done on the CNC retrievals of each model group using the same sampling sets used for the calibration and validation of the PLSR and the VI models. In the following section, the PRO-SAIL results therefore also refer to the terms "calibration" and "validation" dataset. Nevertheless, PROSAIL was not calibrated, rather model parameters were chosen from values reported in literature for them to provide a good representation of the variability of the wheat properties observed in the 2017 field campaigns. The validity of this parametrization was then checked against the 2017 field measurements.

Model accuracy estimation
The R-squared, RMSE and relative RMSE (RRMSE) express the model accuracy for the retrievals of each model group: The RRMSE refers to the mean ( Y mean ) of the observed field and laboratory measurements. The mean of CNC was calculated from the pooled 2017/2018 dataset, while the mean for LAI, Cab and CCC was calculated from the 2017 sampling set:

Descriptive statistics of field and laboratory data
Descriptive statistics of the measured variables separated by year (Tables 6, 7) reveal that the 2018 samples lie well within the ranges of the 2017 samples. The 2017 data, since it was collected over four campaigns during the growing season, have a larger value range and larger standard deviations and variation coefficients than the 2018 samples that were collected during a single growth stage.
The correlation matrix (Table 8) shows correlations higher than 0.8 between LAI, fresh biomass (FM), dry biomass (DM) and CNC. The correlation of 0.81 between remote sensing-based LAI and CNC may allow for an empirical estimation of CNC. LNC is not correlated to Cab (r = − 0.08), so the retrieval of CNC based on an empirical relationship between the two variables was not pursued in this study. Figure 3 (left) illustrates the distribution of the LAI measurements for all field campaigns in Belgium and France. In general, LAI and vegetation cover at the Belgian and French sites have a comparable value range, except for the Belgian sites during the first campaign, where LAI observations were lower.
In general, CNC (Fig. 3, right) values followed the observed LAI trends and increased during the growing season. However, there were large differences in the absolute CNC levels between Belgium and France. In Belgium the first field campaign, CNC median values   Figure 4 displays the detailed 2017 and 2018 results for the PLSR and LUT inversion models. PLSR in the 2017 data achieved better CNC retrievals (R2 = 0.52, RMSE: 27.9 kg/ ha) compared to LUT inversion, but in the independent validation of 2018 data, PROSAIL LUT inversion performed better than PLSR. Values of R2 and RMSE for LUT inversion remained almost unchanged, but PLSR results (R2 and RMSE) decreased substantially. The 2017 dataset showed two samples with very high CNC values of more than 240 kg/ha, but neither a statistical outlier test nor field or laboratory protocols indicated them as outliers and therefore both samples remained in the analysis. Nevertheless, the model accuracy in terms of R2 and RMSE is labelled both with and without those samples (Fig. 4). All VIs showed a large decrease in accuracy, when models were calibrated based on 2017 data and applied to 2018 data (Table 9). Among the VIs, REP (similar results for NDRE) performed well on the 2017 data (R2 = 0.49, RMSE = 30.5 kg/ha), but showed a decrease on the 2018 data (R2 = 0.03, RMSE = 49.0 kg/ha). The others performed similarly or worse.

Comparison of CNC retrieval methods
The weaker scheme B in general resulted in lower calibration, but higher validation accuracies compared to scheme A. In scheme B, differences between methods were less pronounced. There is in general a smaller drop in accuracy between calibration (RRMSE = 22-30%) and validation (RRMSE = 24-32%). Again, the smallest drop

Obtaining CNC from PROSAIL variables
Three strategies were identified to derive CNC from PROSAIL-retrieved variables (see Fig. 2), where each strategy was intended to examine relations between the remotely sensed LAI or CCC with field measured variables (Table 10). The three retrieval paths were also tested with the LAI product of the SNAP biophysical processor and the LAI product provided by the Sentinel-2 Value Adder of BOKU (Vuolo et al., 2016, https:// s2. boku. eodc. eu/), hereafter referred to as BOKU-LAI. For all products, the retrieval path through dry biomass as an intermediate variable worked most reliably compared to the other paths (Table 10). Figure 5 displays the retrieval path using PROSAIL retrieved LAI for estimating CCC and afterwards estimating the CNC with the relationship deducted from the field and laboratory data. Although the estimation of CCC was of medium accuracy, CNC was estimated with a RMSE of 35.4 kg/ha (24.8%). LAI and CCC reveals at lower values an underestimation, while on higher values a larger scattering occurs. In contrast the behaviour of the CNC retrieval shows a scattering along the 1:1 line neither with an under-nor an overestimation in the observed value range.

Monitoring of wheat CNC
The CNC retrievals based on a PROSAIL LUT inversion or on a PLSR seems suitable to derive CNC status on winter wheat fields during the critical growth stages  to support farmers with their N treatment decision. The map (Fig. 6) shows winter wheat fields in France during the critical vegetation stages in 2017 and the development of CNC from April to begin of June 2017. Mostly an increase of CNC can be depicted until end of May, the June image also shows already a reduction of CNC. The fourth date (12. June 2017) is already outside of the investigated period of the vegetation development and so the model performance should be interpreted carefully.

Model validation and transferability
PROSAIL LUT inversion-based CNC retrievals were successfully used to map the actual N status in agricultural fields. The prediction showed a stable error range of 34-37 kg N/ ha utilizing different Cal / Val strategies. Delloye et al. (2018) also conducted field work on conventional fields with common management practices and used a PROSAIL LUT inversion approach with an artificial neural network on a multi-year and multi-cultivar model. They achieved a comparable prediction error of 32.4 kg/ha based on S-2 data in the period of the third N-supply in May. Regarding transferability in space and time, the PROSAIL LUT inversion scheme achieved better results than empirical VI models and the multivariate PLSR. However, the PLSR achieved better results (RMSE = 31-34 kg N/ha) on validation scheme B than PRO-SAIL LUT inversion (RMSE = 34-37 kg N/ha) and PLSR could be argued to be the more suitable approach for CNC prediction in the regions from which the calibration samples were taken. In contrast, VIs showed very poor results applying the independent validation (around 50 kg N /ha) and are therefore not suitable for practical application in precision agriculture where transferability is needed. While PROSAIL and PLSR use the entire information of the spectra, VIs are limited to a few spectral bands. Nevertheless, VIs are easy to use and the results can be achieved quickly, but they reduce the spectral information in such a way that it can also result in a loss of prediction accuracy.
Compared to the achievement of PROSAIL LUT inversion, VIs could not show better results in scheme B. In general, the better results of the validation scheme B showed the influence of the sample population on the accuracy of the empirical (VI) models and the multivariate PLSR. In scheme A, the validation data was not in the parent population of the calibration, while with scheme B, the validation data belongs to the same population, due to the bagging procedure with randomly generated calibration and validation data sets. Therefore, the 2018 samples were part of the parent population in scheme B. That prediction accuracy is strongly dependent on the validation approach confirmed by other studies: For instance, Jay et al. (2017) observed that VIs performed better than a PROSAIL inversion scheme as long as a LOO-CV was applied; changing to an independent calibration and validation data set, the PROSAIL inversion was closer to or better than the VI models estimating crop attributes in sugar beet crops (e.g. LAI, Cab and CNC). It is also important to note that the validation dataset in scheme B was larger (n = 43) than the independent one of scheme A (n = 18). The size of the validation dataset has an influence on the prediction accuracy, as with larger datasets the influence of outliers, respectively uncertainties within the radiometric correction of the spectral reflectance, will be diminished.
Some studies stated that CNC with empirical VI models would achieve better results in the range of 8-15 kg N/ha, however, they used ground-based measurements from hyperspectral sensors (e.g. Hansen & Schjoerring, 2003;Thorp et al., 2017). The scale differences between field and satellite measurements are attributed to this offset. For instance, Li et al. (2014) simulated S-2 and Venµs data and calculated 3 band indices, reaching 27.8 kg N/ha RMSE for CNC estimation with simulated S-2 data, which is in a comparable range to the results of this study.
Another limitation is the non-uniformity of N distribution in the vertical structure of a plant, as pointed out by Li et al. (2013). S-2 or any other optical EO system is mainly observing the upper leaves which could limit the accuracy of the CNC retrievals especially at later growing stages. The translocation of after maternity stage is also visualised in Fig. 6 where CNC is partly decreasing for the maps of June.
In the study of Magney et al. (2017), an accuracy of around 16 kg N/ha was reported with NDRE as the best performing index using RapidEye data. The study was conducted across a three-year range, on different farms and growing conditions. All farms were conventionally managed with the common practices of the region. However, the study is not directly comparable with the results of this research since the transferability from one year to another was not tested and the methodology is slightly different due to a comparison with CNC at harvest time and not at day of observation.
Furthermore, a slight decrease of LAI between the second and the third field campaign must be considered. It occurred during a drought event on the French sites in April 2017, resulting in decreased biomass production. Presumably, in this situation, the true LAI had not decreased, but a reduction of the hydrostatic pressure in the leaves caused a change in leaf angles that affected the optical LAI measurements. In this case, especially the PRO-SAIL model could simulate a lower CNC than one or 2 weeks before.
Reasons for the moderate retrieval accuracies are attributed to the spatial scale difference between the pixel size of S-2 with at least 10 m and the sampled area during the field campaigns (40 × 40 cm). Although homogeneous areas were sampled, the variability of the wheat biomass and N could not be completely covered. Notwithstanding, the highest influence for data-driven influences on the retrieval accuracy is attributed to the time lag between the field campaigns and the satellite image acquisition. The maximum time lag for a few samples (n = 12) was 3 weeks between satellite acquisition and field work (see Table 3), although most acquisitions were less than 1 week apart. Depending on the development of the plants and the length of time lag uncertainties to the model accuracy can be introduced. However, this problem is already reduced nowadays by the launch of the second Sentinel-2B sensor and the resulting higher observation frequency. Still, in Central Europe a certain time lag due to cloudy data often needs to be accepted.

Retrieval strategies
The steadiest retrieval strategy was that using dry biomass as an intermediate step. This was observed for all tested input LAI products (PROSAIL-LAI, SNAP-LAI, BOKU-LAI). The correlation between LNC and biomass was high, which could explain this model behaviour. It is known that the N application on high levels like it is the case for commercial fields, leads to an increase in biomass production. Thus, it is reasonable that the consideration of biomass for the CNC estimation increases the prediction accuracy.
Reasonable retrievals with a similar accuracy as the biomass-driven models would also be expected from the direct estimation with CCC, as the Cab is expected to show a high correlation with the N content. However, in a non-shortage N situation, CCC is probably saturated and then biomass is the better estimator. Delloye et al. (2018) pointed out that the interpolation of Cab in time is challenging and may lead to significant uncertainties. Another reason for the weak relationships between CCC and CNC can be the way that the CNC was measured: The entire above ground canopy was taken and analysed for CNC of the entire wheat plant, while for instance Houlès et al. (2007) reported a strong relationship between leaf chlorophyll content (LCC) and leaf N content. However, the correlation in between Cab and LNC was practically absent (-0.08) in this study. LNC showed moreover moderate negative correlations with FM and DM, indicating a dilution effect Justes et al., 1994) where the concentration of N decreases with increasing biomass during plant growth. The strong positive correlation between CNC and DM or FM and the weak negative correlation between CNC and LNC shows that CNC is mostly driven by biomass and not by LNC. The product of LAI and Cab to CCC is more reliable than the single estimation of LAI and Cab itself (Weiss et al., 2000). The Cab/SPAD values measured were of a comparatively high level on all sites. Other authors revealed a larger range of Cab / SPAD values (Baret et al., 2007;Uddling et al., 2007;Zhu et al., 2012) mentioning that plant stress caused by N shortages could hamper the Cab production and lead to a lower leaf Cab content than in a well-supplied N system.

Conclusions
This study analysed the predictive capacity of three widespread methods in estimating LAI and CNC with two different validation schemes. The two main conclusions drawn from the results are: 1. The PROSAIL retrieval scheme was able to predict CNC (on dry biomass) with an RMSE of around 33.9 kg/ha CNC on the 2017 dataset and 36.8 kg/ha CNC on the 2018 dataset. When using scheme A, PLSR shows a calibration error (2017 dataset) of 27.9 kg N/ha and a validation error of 38.4 kg N/ha (2018 dataset), and the VI-based CNC retrievals also show good results between the PLSR and PROSAIL results, while the validation error with scheme A reached around 50 kg/ha or more for all tested VIs. This shows that the PROSAIL inversion scheme was more stable than the PLSR and the VI methods when applied to new data. Using scheme B, PLSR performed similarly to the PROSAIL inversion scheme, while the VIs also showed slightly weaker results here in the validation than in both other methods. For unknown data, it seems that PROSAIL retrievals are more predictable concerning the error margins than purely statistics-based models. 2. The modelling path using dry biomass as step stone for estimating CNC performed best and was most reliable in this study. Using the PROSAIL-LAI, it performed with less than 35-40 kg N/ha RMSE. The retrieval method based on fresh biomass or CCC was, concerning accuracy, close behind. Nevertheless, both methods were not as steady as the dry biomass to CNC model. The same behaviour was observed by running the empirical retrieval chains using the LAI product of BOKU and from the SNAP biophysical processor (Table 10). Due to the weaker LAI estimation of BOKU and SNAP on the wheat samples, the other modelling chains led to much higher RMSE than the PROSAIL-based estimates (> 50 kg N/ha).
All methods presented are in principle able to support farmers with knowledge of the crop state and can therefore allow them to adapt their fertilization management according to the growing conditions. Despite the error margin being around 35 kg N/ha and maybe still too high, farmers are experienced on their own field sites (soil, weather, potential compaction layers) and can use this additional spatial information (Fig. 6) to make better decisions on fertilizer applications. An important limitation of this study was the temporal gap between image acquisition and reference data collection which was partly related to the availability of only one S-2 satellite in orbit. With the two-satellite constellation of S-2, a denser time-series of image data can be expected. A combination of S-2 and Landsat-8/-9 images could provide the temporal image density required for an operational service, e.g. bi-weekly cloud-free images.
Compared to multi-spectral systems, future candidate operational hyperspectral satellite missions (CHIME, SBG) have a higher potential for accurate N retrieval. They will provide a better spectral resolution in absorption features of N-bearing leaf constituents in the visible to red edge (chlorophylls) and particularly in the SWIR domain (proteins).