Predicting site-specific economic optimal nitrogen rate using machine learning methods and on-farm precision experimentation

Applying at the economic optimal nitrogen rate (EONR) has the potential to increase nitrogen (N) fertilization efficiency and profits while reducing negative environmental impacts. On-farm precision experimentation (OFPE) provides the opportunity to collect large amounts of data to estimate the EONR. Machine learning (ML) methods such as generalized additive models (GAM) and random forest (RF) are promising methods for estimating yields and EONR. Twenty OFPE N trials in wheat and barley were conducted and analyzed with soil, terrain and remote-sensed variables to address the following objectives: (1) to quantify the spatial variability of winter crops yield and the yield response to N using OFPE, (2) to evaluate and compare the performance of GAM and RF models to predict yield and yield response to N and, (3) to quantify the impact of soil, crop and field characteristics on the EONR estimation. Machine learning techniques were able to model wheat and barley yield with an average error of 13.7% (624 kg ha−1). However, similar yield prediction accuracy from RF and GAM resulted in widely different economic optimal nitrogen rates. Across sites, soil available phosphorus and soil organic matter were the most influential variables; however, the magnitude and direction of the effect varied between fields. These indicate that training a model using data coming from different fields may lead to unreliable site-specific EONR when it is applied to another field. Further evaluation of ML methods is needed to ensure a robust automation of N recommendation while producers transition into the digital ag era.


Introduction
Nitrogen (N) management is one of the most critical management decisions to maximize yield and profit in cereal grain production. However, applying sufficient N to meet the crop demand while minimizing environmental N losses is still a challenge (Rockström et al., 2009;Zhang et al., 2015). This challenge remains due to the high variability of the economic optimal N rate (EONR) associated with the spatial and temporal variability in the yield response to N (Kahabka et al., 2004;Mamo et al., 2003;Pierce & Nowak, 1999) and to the uncertainty in modeling the relationship between N rates and yields (Kyveryga et al., 2009;Scharf et al., 2005;Tremblay et al., 2012). Despite the extensive knowledge about the complexity of N recommendations, current N recommendations are usually simplified versions of the true yield response function and they are not as accurate as needed to provide consistently reliable estimates of N needs across years at the field or subfield scale (Morris et al., 2018). For example, the use of yield goal as a proxy for estimating the EONR is a simplified method that often fails due to the lack of correlation between yield and EONR (Bachmaier & Gandorfer, 2009;Rodriguez et al., 2019;Scharf et al., 2006). Accurate recommendation systems are needed to optimize economic returns for farmers, maintain or increase yield required to meet global food demand and preserve the long-term functionality of terrestrial and aquatic ecosystems. Large variation in EONR poses a great challenge in attaining these goals.
Site-specific crop N management has been proposed as an effective way to increase N use efficiency (NUE), profit and yields by capturing the spatial and temporal variability of the optimal N rate (Mamo et al., 2003;Mulla, 2013;Trivelli et al., 2019). Unfortunately, the selection of a site-specific EONR is difficult to predict (Setiyono et al., 2011;Van Es et al., 2006). A site-specific N recommendation relies on the knowledge of the heterogeneous yield responses to N within the field and the ability to describe the response as a function of crop management, soil characteristics and weather (Pringle et al., 2004;Trevisan et al., 2021). However, site-specific information about crop yield responses to N rate is limited (Trevisan et al., 2021) and the benefits of a site-specific N management are often not fully understood (Lobell, 2007;Whelan & McBratney, 2000). Thus, the level of adoption has remained low despite the increasingly available site-specific technologies (e.g., yield monitors, remote sensing and variable rate applications). Scalable availability of sitespecific information is fundamental to improve N recommendations, benchmark the value of precision and digital agriculture technology in crop production systems and promote adoption (Morris et al., 2018;Rodriguez et al., 2019).
Fortunately, variable-rate technology can be used to systematically control input levels in highly mechanized, large-scale production systems, making it possible to run large-scale on-farm precision experimentation (OFPE) (Kyveryga et al., 2009;Piepho et al., 2011). These experiments generate large amounts of site-specific yield response to input data that can be used, for example, to understand the spatial variation of EONR . OFPE can be combined with inexpensive soil, terrain and crop characteristics such as apparent electrical conductivity, organic matter, elevation and vegetation indices (VI). This enables researchers to better capture the spatial structure of field attributes and their effect on yield response to N (Derby et al., 2005;Kitchen et al., 2008;Tremblay et al., 2012). Thus, site-specific yield response models can be estimated (Morris et al., 2018;Rodriguez et al., 2019). Therefore, OFPE allow for testing whether input rates can be profitably varied within fields by matching site-specific requirements (Bachmaier & Gandorfer, 2009;Lark & Wheeler, 2003).
Machine learning (ML) methods, such as random forest (RF), have gained popularity in fields related to yield modeling and N status estimation due to their capability of processing large and different types of data and analyzing both linear and non-linear relationships (Chlingaryan et al., 2018;Krause et al., 2020;Qiu et al., 2016). However, ML models tend to require large amounts of data and results could be difficult to interpret (Chlingaryan et al., 2018). The generalized additive models (GAM) may offer a middle ground between sophisticated ML techniques and simple linear models allowing for reliable predictions of complex processes while maintaining interpretability (Gardner et al., 2021;Wood, 2017). The large amount of site-specific yield response data from OFPE provides an opportunity for deploying site-specific ML models for yield and EONR estimation. To the authors' knowledge, their use for EONR prediction has not been yet tested (Liakos et al., 2018).
This study proposed to improve the understanding of the spatial variability of crop yield response to N and to deploy ML based models to predict yield and EONR. OFPE on wheat and barley, field-specific crop and soil characteristics data and ML methods were used to address the following objectives: (1) to quantify the spatial variability of winter crops yield and the yield response to N using OFPE, (2) to evaluate and compare the performance of GAM and RF models to predict yield and yield response to N, and (3) to quantify the impact of soil, crop and field characteristics on the EONR estimation.

OFPE designs and treatments
Thirteen wheat and seven barley N rate trials were conducted on commercial fields within the northwest region of Buenos Aires Province, Argentina (37.2017° S, 59.8411° W) from 2017 to 2019 (Fig. 1A, B). Soils in the experimental sites were sandy loams, thermic Typic Hapludoll, corresponding to the transitions between hills and the lowest areas; sandy loams, thermic Entic Hapludolls, corresponding to the sandy hills and Thapto-argic Hapludolls, associated with shallow soils with presence of a clay pan layer at fluctuating depths. Precipitation varied from 680 to 1012 mm with an average of 850 mm across sites and years.
Field trials were part of the Data-Intensive Farm Management (DIFM) project described in Bullock et al. (2019), with N as a treatment (DIFM N-trial). Each DIFM-N trial had six target N rates applied as broadcasted urea (0,46,92,138,184 and 230 kg N ha −1 ) or sprayed as urea ammonium nitrate (0, 32, 64, 96, 128 and 160 kg N ha −1 ) at early tillering. The allocation of the N rates in the experimental units followed a completely randomized factorial design replicated over a portion of the field considered to be the most spatially variable in soil properties and yield within the field (Fig. 1C). All N rates were implemented in the field using a variable rate fertilizer applicator and the as-applied data were used for further analysis. Each experimental unit with a unique N rate ("plot") was given a dimension to match the swath width of the machinery available. Equipment swath varied from 12 to 35 m for the N applicators and from 6 to 12 m for the harvester. Size of the plots were adjusted to ensure two harvester passes within the as-applied N area. Thus, plot size varied among years, from 90 m 2 to 200 m 2 . Other farming practices were kept constant throughout the field, and they were conducted by the farmers following standard protocols for the region.

Field measurements and remote sensing data
Soil apparent electrical conductivity at 0.30 and 0.90 m depth (EC30 and EC90, respectively), soil organic matter (SoilOM), P-Bray (SoilP) and pH (SoilpH) to a depth of 0.20 m were measured at each site. Soil ECa was collected in a series of parallel transects spaced approximately 30 m apart, using a Veris model 3100 (Veris Technologies, Salina, Kansas, USA). The ratio of the electrical conductivity at 0.30 m and 0.90 m was calculated (ECratio). Soil sampling was performed following a systematic grid with a density of one sample per hectare during the fall season after harvesting of the previous crop. Elevation data (Elev) was measured with a real time kinematic system (RTK, Trimble 5700, USA) and terrain parameters such as aspect (Asp), curvature (Curv), slope (Slo), terrain position index (TPI) and topographic wetness index (TWI) were derived using ArcMap 10.7 raster calculator (Environmental Systems Research Institute, Redlands, CA, USA). Two VIs were calculated using Sentinel-2 via Google Earth Engine platform (Gorelick et al., 2017), the normalized difference red edge index (NDRE) and the normalized difference vegetation index (NDVI). Previous findings have promoted these VIs as important indices to track key traits such as biomass, N status and leaf area index (Foster et al., 2017;Magney et al., 2017;Samborski et al., 2015). Published research supports the use of proxies, such as NDVI or NDRE, of the previous crop biomass to improve the prediction of optimal N rates due to their role in the N cycle (Archontoulis et al., 2020;Puntel et al., 2019). The NDRE from the previous crop (NDREpc) and NDVI from the previous crop (NDVIpc) were selected from the remote sensing variables based on their low correlation with other covariates and high correlation with yield.

Data processing and spatial aggregation
The DIFM N-trial OFPE prescriptions ( Fig. 2A) were used to select the raw as-applied data from the feedback sensors in the variable rate applicator corresponding to the experiment area within the field. Raw as-applied point data from the applicator controller was transformed into polygons based on the distance and width of the machinery (raw as-applied polygon areas). Raw as-applied polygon areas (Fig. 2B) were cleaned by removing overlapping polygon areas and outliers. To facilitate data aggregation of field characteristics, consecutive as applied polygons (Fig. 2B) corresponding to N rates within 20 kg N ha −1 from each other were merged into a single as-applied uniform N application area (Fig. 2C).
Yield data was collected using combine yield monitoring system and cleaned with the Yield Editor 2.0 software (Sudduth et al., 2012). Further quality control of the yield data was performed by relating it with NDVI (Trevisan et al., 2019a). Field_ID 12 and 13 were removed from the database due to lack of quality in the data. These fields had extremely low and high yield values, abnormal patterns across the field and poor correlation between yield and maximum NDVI from the growing season. Cleaned yield data points were transformed into polygons using the width and the traveled distance recorded in the yield monitor (now called "yield harvest area") and an overlap between yield polygons was allowed up to 5%. Final database contained 40% barley and 60% wheat yield data points.
Yield harvest area was then spatially joined with the as-applied uniform N application area (Fig. 2C, Fig. S1) and interpolated field characteristics. Point-based data layers (e.g., EC and grid soil sampling) were interpolated using empirical Bayesian kriging at one-meter square resolution. Soil properties and VIs characteristics were averaged within the yield harvest area. Data processing was performed using ArcMap 10.7 (Environmental Systems Research Institute, Redlands, CA, USA) and Quantum GIS (QGIS Development Team, 2021).

Analysis of spatial yield variability
To evaluate the yield variability between and within fields (spatial), the coefficient of variability (CV) was used (Kravchenko et al., 2005). The CV for yields with no N fertilizer (Yield-N0, less than 26 kg N ha −1 ), and yield with high N fertilizer (Yield-NH, more than 161 kg N ha −1 ) were calculated per field, year and across all site-years. In addition, the mean yield response to N was calculated as the difference between Yield-NH and Yield-N0 and analyzed for statistical differences using a t-test and considered significantly different at a p value < 0.05.

ML yield modeling
Yield was modeled using RF and GAM. The explanatory variables used to fit the models vary field by field (see Table S1 for a selection of the variables characterizing each field). The "grf" and "mgcv" packages in R were used to train GAM and RF models, respectively (Tibshirani et al., 2022). The RF model does not require explanatory variables to be specified quantitatively in relation to the dependent variable. In contrast, the GAM model needs to specify what variables to be modeled flexibly using smoothing splines and what and how variables interact with each other. Let x j denote j explanatory variable ( j total number of variables) and N denote nitrogen rate. The yield model for GAM is specified as follows: In the above formulation, f () is the smoother of the impact of nitrogen on yield, g j () is the smoother of the impact of x j , and x j × N is the interaction of nitrogen rate and x j that captures how the impact of N is altered by other variables ( x j ). For both f () and g(), thin plate splines are used for smoothing (Wood, 2003). Yield predictions were run for each yield harvest area.
RF and GAM were tuned using spatial cross-validation. When spatial autocorrelation is present in the dataset, such as DIFM N-trials (Miller, 2004), hyper-parameter tuning using spatial cross-validation is recommended instead of regular cross-validation (Lovelace et al., 2019;Vucetic et al., 1999). RF was tuned on three hyper-parameters and two sets of variable collections using spatial cross-validation. Specifically, the number of variables used for sample splitting, minimum number of observations per tree leaf and the sample fraction parameter for RF were tuned. Further, two sets of variable collections were considered. The first set included all the explanatory variables, and the second set included only variables that are not highly correlated with one another. For the second set, one of the variables with a correlation coefficient higher than 0.8 was dropped. For GAM and RF, spatial cross-validation was conducted to select one of the two sets of variable collections based on the model performance (Table S2).
Spatial cross-validation was conducted using tenfold spatially clustered train and test datasets generated using the "spatial sample" package (Silge & Mahoney, 2022). The area per fold varied from 2 to 4.5 ha with a range of 200 to 300 observations per fold. Figure 3 illustrates how spatially clustered train and test datasets were laid out for the 10 folds for field 1 as an example. The optimal set of hyper-parameters and variables that had the lowest average root mean square error (RMSE) of yield prediction from the spatial cross validation for each model and field was selected (Table S2). RF and GAM were then trained using all the observations entire dataset with their respective selected hyper-parameters and a set of variables for each field. For RF, the number of trees was set to 2000 to stabilize predictions.

Estimation of EONR and YEONR
Site-specific EONR for each field was obtained by solving the profit maximization problem at each yield harvest area. Let s denote a site within a field.
In the above equation, f () is the trained model, f N, X s is the estimated yield based on nitrogen rate (N) and characteristics of the site X s , Pw is the price of wheat and barley (set at $160 Mg −1 ), Pn is the price of nitrogen (set at $0.76 kg −1 ).
To test the performance of RF and GAM models at estimating site-specific EONR, estimated EONR was compared against a local approximation of the "true" EONR. It is an extremely hard task to verify the accuracy of the estimated site-specific EONR because the "true" EONR site-specifically is not observed. The EONR is always derived from yield observations. In contrast, validating models for yield prediction accuracy is straightforward because real yields are observed. Here, the same tenfold test datasets used for spatial cross validation was used to check the accuracy of the simulated site-specific EONR. Specifically, the following steps for each fold were taken: (1) fit a quadratic plateau model using the test dataset to estimate the yield response to N function, (2) derive a local EONR based on the estimated yield response function obtained in the previous step using Eq. (2), (3) calculate the average of the EONRs estimated using RF and GAM for all points, (4) Fig. 3 Example of the train and test datasets used in spatial cross-validation (Field_ID 1) compare the estimated EONR with the average RF and GAM estimated EONR. For this analysis, fields with less than 1000 observations were excluded to ensure that each fold has at least 100 observations. After this screening, 14 fields had sufficient observations per fold to perform this analysis. It is worth emphasizing that this testing of EONR estimation is not a formal test of how accurate the site-specific EONRs from RF and GAM were. The reference points for the RF and GAM are not the true EONR, but instead an estimate of EONR derived from localized regressions as described above. The proposed method is intended to give a partial insight into how well the ML EONR estimates were.

Drivers of variability in yield response to N
The relative importance of each predictor in the ML models were analyzed to understand what factors contributed to the spatial variability of EONR. To do this, for each site, first the RF model was trained to estimate the site-specific EONR using the same explanatory variables used to estimate yield. Then, the Shapley value analysis (Lundberg et al., 2017) was run for each explanatory variable using the "shapr" package (Sellereite et al., 2020). The Shapley values obtained from each explanatory variable measured the degree and direction of their contribution to the heterogeneity (variation) in the estimated EONR. Shapley values are widely accepted measures to enhance the interpretability of machine learning methods (Redell, 2019), and they are more suitable than the variable of importance measure because variable of importance does give the direction of the impact. Please note that Shapley value analysis on the model trained to predict yield (not EONR) will only give information about what factors contribute variation in yield but not in EONR.

Observed spatial variability of yields
In this study, the spatial variability of wheat and barley yield response to N was quantified using OFPE. Yield ranged from 1.32 Mg ha −1 to 8.83 Mg ha −1 with an average of 4.54 Mg ha −1 across all sites, which is higher than the national average (3.2 Mg ha −1 , MAGyP, 2019). Wheat yield was on average 4.7% higher than barley (4.67 Mg ha −1 vs 4.46 Mg ha −1 , respectively).
The relationship between yield and N showed a non-linear pattern in most sites (Fig. 4). In 12 sites, a significant and positive response to N was found (yield-NH minus yield-N0 > 0) with a maximum of 1.5 Mg ha −1 (Fig. 4, Table 1, p < 0.05). The magnitude of the yield response to N is in agreement with other published studies in similar climatic conditions (Barbieri et al., 2008). Two sites showed a reduction in yield at NH when compared with yield at N0 of 0.19 Mg ha −1 and 0.33 Mg ha −1 (Fig. 4, Table 1, p < 0.05). A negative yield response in wheat and barley is rare in the northwest region of the Buenos Aires province. The lack of yield response could be partially attributed to errors associated with the yield monitoring process and temporal shifts in the yield monitor and as-applied data. As reported in other studies, the dispersion of yield values for the same input rate is much higher in OFPE than what is commonly observed in small plot research and can be also associated with the spatial variability of yield that would be observed even if inputs were applied at a uniform rate throughout the field (Fig. 4, Trevisan et al., 2019a). Despite the variability in the yield response curves, a subset of sites showed a clear positive yield response to N rates (e.g., Field_ID 1 and 16). The distribution of points also supports the decision to use linear regression to represent the yield responses.
The CV within fields for yield-N0 ranged from 5.3% to 28.5%, and for yield-NH from 4.7% to 31.7% (Table 1). Within-field yield variability is mainly associated with the spatial variability of the field soil properties and landscape position (Tremblay et al., 2012). Interestingly, a significant and positive correlation between the yield CV and the size of the N trial was found (R 2 = 0.29, data not shown). This indicates that the yield CV not only depends on the natural variability of the field but also on the experiment's scale and field location (Trevisan et al., 2019a). A larger experimental area could represent bigger within-field spatial variability in factors (e.g., organic matter, texture) other than the treatments, also known as random effects (Kravchenko et al., 2005). The larger scale of the OFPE compared with the plot-scale research gave the opportunity to better estimate the treatment and the random effects under different soil and crop conditions, therefore providing valuable data for ML modeling purposes (Rodriguez et al., 2019). However, further research is needed to optimize the size of the OFPE to better represent spatial variability of the site .
Despite the challenges associated with the on-farm research trials such as the precision of the trial implementation, data quality and processing Kyveryga, 2019;Trevisan et al., 2021), results showed the value of OFPE to benchmark current production systems at the field level and help to quantify the impact of a particular nutrient management practice such as N management (Trevisan et al., 2019b). Most of the fields showed an average yield-N0 and yield-NH CV greater than 10%, suggesting potential economic benefits from a variable rate N application considering the site-specific yield response to N fertilization (Bullock et al., 1998;Robertson et al., 2008). For this study region, a site-specific N rate might be needed to maximize yields and profit (Fig. 4).

ML modeling of yield, EONR and YEONR
The performance of GAM and RF models to predict yield and yield response to N was evaluated and compared. Figure 5 presents the RMSE of the yield prediction for the trained data set (top panel) and the average of RMSE for the tenfold spatial cross validation (bottom panel). RF performed better than GAM in predicting yield consistently across fields (Fig. 5). Average RMSE for the training dataset was 476 and 772 kg ha −1 for RF and GAM, respectively. In most of the fields, RF RMSE from the training dataset was ~ 10% smaller than GAM (except Field_ID 3). Differences in model performance between RF and GAM were likely due to their different ability to handle the complex interactions existing between the yield, N rates and covariates. RF and GAM suggested wildly different EONR estimates, while their YEONR estimates were similar (Fig. 6) even for fields where their yield modeling accuracy was the same (Figs. 5 and 7). A total of 14 sites showed visual differences in the EONR distributions between the two types of models (Fig. 6). In almost all fields, GAM models estimated EONR with an average of 26.7 kg N ha −1 higher than RF (Figs. 6 and 7). While both models predict no N to maximize profit in many parts of the field, GAM tended to have high Table 1 Average yield and coefficient of variation (CV) for yield with no fertilizer (Yield-N0), high fertilizer (Yield-NH), and the difference between Yield-NH and Yield-N0 (yield response) "*" statistically significant difference between Yield-N0 and Yield-NH at p < 0.05; "na" missing values frequency of N rates above zero and more heterogeneous N rates than RF (Fig. 6). Noteworthy, RF suggested almost no N application (< 10 kg N ha −1 ) for nine fields to obtain the highest profit (e.g., Field_ID 2,5,6,8,11,13,15,16 and 17). The average EONR was 43.6 and 20.4 kg N ha −1 across sites and years for GAM and RF models, respectively (Fig. 7, Table S4). Both values were below the national fertilization averages of 71 kg N ha −1 for wheat and 63 kg N ha −1 for barley (Bolsa de Cereales, 2020). The EONR ranged from 0 to 244 kg N ha −1 and 0 to 225 kg N ha −1 , for GAM and RF models, respectively (Table S4). This suggests that the yield response to N and the related EONR cannot be generalized for a region or field (Scharf et al., 2005) likely due to complex relationships occurring between soil, management and weather that govern the yield response to N (Kyveryga et al., 2009;Morris et al., 2018;Tremblay et al., 2012).

Comparison between ML site-specific EONR and a localized approximation of the EONR
Despite the comparison between ML models, the observed site-specific EONR value is unknown (not observed) and thus drawing a conclusion about what model was closer to the true EONR is not possible (Kakimoto et al., 2022). In this work, a local approximation of the "true" EONR was carried out to test the accuracy of the estimated site-specific EONR by RF and GAM. Testing the validity of a model for EONR estimation is fundamentally different from validating a model for yield prediction. For yield prediction, the observe yield values is observed directly which allow us to validate the model with ground truth data. However, the same validation with EONR cannot be done as EONR is derived from yield observations. While the local approximation of the "true" EONR is appealing, there is no statistical guarantee nor peer review literature that confirms the approach indeed works. The local estimate of EONR could be error prone. Figure 7 presents the estimated local EONR for individual folds for all the fields compared with the average EONR from GAM and RF model. RF and GAM-based EONR were not close to the locally approximated EONR (Fig. 7; Table S3). The average local EONR was 46.9 ± 54.4 kg N ha −1 . Several folds within fields seemed to be underestimated by RF or GAM, which suggest N rates below 10 kg N ha −1 while the local EONR was higher (e.g., Field_ID 2 and 7). RF-based EONR consistently underestimated the EONR by 29 kg N ha −1 on average compared to the local approximation (Tables S3 and S4). In contrast, the GAM model underestimated EONR in seven fields Fig. 6 Frequency distribution of A economic optimal nitrogen rate (EONR) and B the yield at the EONR (YEONR) by field 1 3 by 16 kg N ha −1 on average and overestimated EONR by 21 kg N ha −1 in the remaining fields. The regional average N rates of 71 kg N ha −1 for wheat and 63 kg N ha −1 for barley (Bolsa de Cereales, 2020) also corroborate the conjecture that RF-based EONR underestimates the EONR.
Despite its limitation, this analysis provided useful insights into the nature of RF and GAM-based EONR estimates. The fact that RF and GAM suggested different EONR when compared to the local approximation indicated that further research is needed to Fig. 7 Comparison between random forest (RF)-based, generalized additive models (GAM)-based economic optimal nitrogen rate (EONR) estimates and the local approximated EONR for each fold and field understand the cause of such disagreement. The model's ability to predict yield well may not mean that it will also predict EONR well (Puntel et al., 2016). While RF consistently performed better in predicting yield than GAM (low RMSE for RF, Fig. 5), its EONR prediction seemed to underperform compared to GAM (Fig. 7, Table S3 and S4). Indeed, RF consistently underestimated EONR (Fig. S2, Table S3). If this is truly the case, relying on RF can be potentially dangerous because under-application of N is much more harmful to profitability than over-application of N due to the nature of the yield response curve to N (Mandrini et al., 2021).

Drivers of the spatial variability in yield response to N
The factors contributing to the heterogeneity of EONR varied between methods and fields ( Figs. 8 and 9). For example, for GAM-based EONR estimation in field 14, the SoilP was the top contributor to the heterogeneity of the estimated EONR. Low and high values of SoilP were associated with low and high values of EONR, respectively. This indicates that EONR increases as SoilP increases. In contrast, SoilP was not found as an important factor for GAM-based EONR in field 2 (Fig. 9). These findings indicate that training a model using data coming from different fields may lead to highly unreliable site-specific EONR when it is applied to another distinct field. The EONR and the factors contributing to its variability are site-specific (Morris et al., 2018;Puntel et al., 2019;Tremblay et al., 2012). Thus, more studies are needed to generate site-specific data to develop N management strategies that better account for soil, weather and management variation under diverse onfarm conditions (Wang et al., 2020).
The variability in EONR estimated by RF was mainly explained by the measured covariates in only five fields (Field_ID 1, 3, 4, 14 and 18, Fig. 8). This is rather expected due to low variability in RF-based EONR estimates of those fields (Fig. 6). In contrast, the impact of measured covariates for GAM-based EONR were high for most of the fields (Fig. 9). Thus, the inclusion of the measured explanatory variables in GAM models contributed to explaining the variability of the EONR more than in RF models.
According to the Shapley analysis on GAM-based EONR model, SoilP is the most or second-most influential factor for the spatial variability in the EONR for six fields. However, the direction of its impact on the EONR was not consistent across the fields ( Fig. 9; Duncan et al., 2018;Holford et al., 1992). Higher SoilP led to higher EONR for four fields (Field_ID 1, 14, 17, 18) but led to lower EONR for two fields (Field_ID 4 and 13). This is not surprising given that there is evidence of interactions between N and P that affect crop root length, plant growth and N uptake (Duncan et al., 2018).
Normalized Elev was a key driver of EONR heterogeneity for fields 3, 7 and 9 using GAM model (Fig. 9). Similarly to SoilP, the direction of its impact was not consistent across fields. While a higher normalized Elev led to a lower EONR in fields 3 and 9, it led to a higher EONR in field 7. A similar observation can be made for SoilOM for fields 7 and 14. The interaction between site-specific terrain attributes, SoilOM and the weather conditions at each field probably had a unique effect on the spatial distribution of soil water content. These interactions affect the mineralization rates from the soil organic pool, which in turn, determines the spatial variability of soil mineral N and ultimately the yield response to N (Peralta et al., 2015;Ruffo et al., 2006;Tremblay et al., 2012). These results reinforce the need for recommendation models with site-specific covariates tailored to a unique (Kg N ha -1 ) Fig. 8 Shapley values of the top five contributing factors (y-axis) for random forest estimated economic optimal nitrogen rate (RF-based EONR) by field. Contributing factors (y-axis) were normalized from 0 to 1 (magnitude shown in color scale). Abbreviation; soilP: soil phosphorus; soilph: soil pH; tpi: topographic index; elevnorm: normalized elevation; ec30: electroconductivity at 30 cm soil depth; ec90: electroconductivity at 90 cm soil depth; soilom: soil organic matter; tpi: terrain position index; twi: topographic wetness index; ecratio: ec30 divided by ec90; normalized difference red edge index and the normalized difference vegetation index (ndrepc and ndvicp, respectively) (Kg N ha -1 ) Fig. 9 Shapley values of the top five contributing factors (y-axis) for generalized additive models estimated economic optimal nitrogen rate (GAM-based EONR) by field. Contributing factors (y-axis) were normalized from 0 to 1 (magnitude shown in color scale). Abbreviation; soilP: soil phosphorus; soilph: soil pH; tpi: topographic index; elevnorm: normalized elevation; ec30: electroconductivity at 30 cm soil depth; ec90: electroconductivity at 90 cm soil depth; soilom: soil organic matter; tpi: terrain position index; twi: topographic wetness index; ecratio: ec30 divided by ec90; normalized difference red edge index and the normalized difference vegetation index (ndrepc and ndvicp, respectively) environment to optimize field N fertilizer management (Cook et al., 2013;Saikai et al., 2020).

Conclusions
The presented approach showed that DIFM N-trials could be used to characterize yields, yield response to N and to test ML models to estimate EONR. This work is the first study predicting yield and EONR in winter cereal crops using OFPE and ML techniques.
Across all sites, yield ranged from 1.32 Mg ha −1 to 8.83 Mg ha −1 with an average of 4.54 Mg ha −1 and a yield response to N from 0 to 1.5 Mg ha −1 that varied between and within fields. Even though GAM and RF models performed well at predicting yield (RMSE of 476 and 772 kg ha −1 for RF and GAM, respectively), both approaches produced widely different EONR values even for fields where their yield modeling accuracy was the same. In addition, ML produced very different EONR estimates when compared to the local approximation. This indicated that further research is needed to understand the causes of such disagreement.
The factors contributing to the heterogeneity of EONR varied between methods and fields. Across sites, SoilP, SoilOM, terrain attributes and EC were the most influential variables; however, the magnitude and direction of the effect varied between fields. These findings indicate that training a model using data coming from different fields may lead to highly unreliable site-specific EONR when it is applied to another distinct field. Further evaluation of ML methods is needed to ensure a robust automation of N recommendation while producers transition into the digital-ag era.