Introduction

Some aspects of modern agricultural systems may be perceived as inefficient. For example, Australian wheat production between 1996 and 2010 was estimated to be about half of what could be achieved under best management practices (Global Yield Gap and Water Productivity Atlas, 2022); similar numbers are found for major crops in Brazil (e.g. soybean and sugarcane) and the USA (e.g. wheat). According to Hochman and Horan (2018), suboptimal nitrogen (N) management is a main contributor to the yield gap in Australia. Yet, estimating the optimal N fertiliser rate to apply can be a challenging task for farmers since the crop demand for the nutrient and the supply by the soil can vary significantly, both spatially and temporally. In addition, Australian farmers operate against a background of climate variability and economic uncertainty, which impacts their management strategies according to their risk profile. Nitrogen management is therefore an important element of risk management (Monjardino et al., 2013) such that tools which increase grower confidence in N decision making may have a large impact on farm business performance. To facilitate N management, several tools and agronomic services have become available to growers. Below, the main categories of agronomic thinking that underpin fertiliser management generally and the various tools for N application are described. For a comprehensive narrative on the various principles guiding crop mineral nutrition and fertilisation management, the work of Lemaire et al. (2021) is recommended.

Arguably the most common agronomic principle to calculate nutrient application rates is the so-called ‘nutrient budget’, or ‘mass balance’ (Meisinger et al., 2008). This approach seeks to balance the crop demand for nutrient with the supply in the system by applying fertiliser as needed. It relies on the prediction of nutrient demand based on estimates of yield potential, the assessment of soil nutrient supply and on other assumptions relating to the quality of the produced yield (e.g. grain protein content) and to the efficiency of nutrient uptake from applied fertiliser. Another common agronomic approach to determine fertiliser rates follows the concept of ‘crop responsiveness’ to applied nutrient. This strategy determines optimal application rates—for example, the rate that maximises yield—empirically based on experimentation. Traditionally, fertiliser recommendations following this approach have been based on average response curves, obtained from fertilisation trials covering several sites and or years in an area of interest (Lemaire et al., 2021). Not surprisingly, the average response curve may not fully account for the specific climate or soil conditions of each field in a given season. Thus, extrapolating recommendations to conditions different to those of the experiments themselves usually generates large errors, even in the same field (Colaço and Bramley, 2019).

Another agronomic framework that is less commonly used amongst producers relies on the notions of nutrient dilution and critical nutrient levels (Lemaire et al., 2019); herein, this approach will be referred to as ‘nutrient sufficiency’. This approach is based on monitoring plant nutrition and assuring, through the application of fertiliser, that the plant nutrient level stays at or above a predetermined threshold (the ‘critical nutrient level’). The threshold represents the minimum nutrient concentration in the plant needed to maximise plant growth or productivity and it is previously determined based on field experimentation. One important consideration is the effect of allometry and nutrient dilution in the plant; that is, the nutrient concentration and the critical level tend to decrease with larger plant biomass. Amongst many applications, these concepts are commonly implemented in ‘rules of thumb’ to interpret tissue nutrient analysis and guide in-season fertiliser interventions. Somewhat analogous to this approach, is the ‘soil nutrient bank’ strategy for fertiliser management (Meier et al., 2021). In this case, the focus is on monitoring soil nutrient levels as opposed to the plant nutrient level to assure that the amount of mineral N in the soil is above a given threshold. Critical for this strategy is to determine, through previous experimentation, an adequate threshold—that is, an adequate ‘size’ of the soil mineral N ‘bank’ for different soil types and production systems. Although these different decision methods seek a similar outcome (maximum crop yield with minimum use of fertiliser), they may result in different recommendations due to the different points of reference being used (i.e., different crop and soil information) and the different assumptions underpinning the recommendation. Moreover, these different approaches vary in their capacity to be deployed in the field and implemented spatially using Precision Agriculture (PA) tools given the likely spatial variation in the key input data.

Over the past three decades, crop sensing technologies have been used to implement some of the above frameworks in a semi-automated fashion to enable site-specific fertiliser application and potentially contribute to closing yield gaps associated with suboptimal N management. The Nitrogen Fertilisation Optimisation Algorithm (NFOA; Raun et al., 2005), which uses a mass balance concept, and the Holland and Schepers (2010) approach based on crop responsiveness, were pioneer sensor-based approaches developed to support site-specific nutrient management. These and similar strategies have been tested for many crops in different countries, becoming one of the most investigated topics in PA research (Colaço & Bramley, 2018; Franzen et al., 2016). They have also been implemented in commercial systems around the world, although with arguably low adoption by farmers (Lowenberg-DeBoer and Erickson, 2019; Bramley & Ouzman, 2019).

Despite these many years of development, a recent review found great variation in the reported benefit from the use of sensor-based N application, with about 25% of studies reporting economic losses (Colaço & Bramley, 2018); Diacono et al. (2013) similarly questioned the economic benefit of precision N fertilisation. Colaço and Bramley (2018) highlighted the fact that most sensor-based strategies treated the N decision as univariate; that is, they rely solely on the measurements of a single type of sensor, despite the inherent complexity of N management decisions, which was suggested as a major reason for observations of poor performance. Moreover, sensor-based methods have been framed around agronomic mechanistic understanding, albeit with simplifications to suit the decision tool, with the sensor technology used to estimate the crop parameters needed to deploy the decision framework. For example, in the mass balance concept used by the NFOA sensor approach, the vegetation index derived by the crop sensor is used to estimate grain yield potential, but the grain quality (i.e., grain protein content) which ultimately determines the total nutrient demand and the fertiliser nutrient recovery by the plant are assumed. Thus, aside from the inherent limitation of the framework itself—the uncertainty of the yield estimation—there are built-in sources of error in the calculation of the recommendation which can compromise its effectiveness for site-specific nutrient management. These issues were thoroughly demonstrated in Colaço and Bramley (2019) and Colaço et al. (2021a). An alternative approach has been suggested where sensed variables inform a machine learning model that predicts the optimal N rate directly. This approach was considered more capable of handling system complexity inherent to N decisions. As shown in Colaço et al. (2021a), this approach outperformed agronomic mechanistic frameworks used by more traditional sensor-based strategies, although the results were based on a single-site, plot-based experiment.

The limited ability of plot-based trials to reflect the inherent farm- and field-scale variability, has been a common constraint in sensor-based technology research (Colaço & Bramley, 2018). Another main issue has been the fact that most studies have used subjective assessments of sensor-based technology by testing their use strictly against current farm practice (Colaço & Bramley, 2018). Consequently, the relative benefit from sensors claimed in these studies depends on how good the current management practice is. Ideally, all management practices in a comparison study, which may include sensor-based approaches and farmer’s practice, should be evaluated against optimal management and then compared amongst themselves. On-farm experimentation (OFE)—field-scale experiments conducted in partnership with farmers (Lacoste et al., 2022) and implemented and monitored using PA technology—is key for enabling both large scale trials and the observation of optimal N rates for the comparison of alternative recommendation methods.

This work sought to assess the recommendation accuracy and the impact on farm profitability of a range of N fertiliser recommendation methods for wheat and barley production in Australia. A parallel goal was to investigate new approaches which may improve farmer confidence in N decision-making in the face of the risks associated with N management. A total of thirteen methods were evaluated including approaches based on various types of digital technologies, input data and frameworks along with a number of currently utilised benchmark methods. The hypothesis of this study was that, with access to a variety of sensing technologies, an on-farm experiment and potentially useful data from a range of on- and off-farm sources, farmers ought to be able to make mid-season N decisions with greater confidence than is possible using approaches typically used at present. Comparisons of these approaches were made using 21 large-scale on-farm field trials spanning the three main Australian grain producing regions over four years.

Materials and methods

The methodology for this study had three main components. The first was the implementation of on-farm crop response experiments, including strips of high (‘rich’), low (‘zero’) and intermediate (‘field’) N rate treatments that were harvested by headers equipped with onboard yield and protein monitors. An important element of these was that they could be implemented by collaborating farmers using the farmers’ own equipment. The second was a site-specific analysis of harvest data to assess the crop response to fertiliser application using a moving window regression approach along the length of each strip trial; this step generated local response functions and estimates of optimal fertiliser rates. The third was the assessment of the different fertiliser management strategies of interest using the local response functions generated in the previous step, with the various strategies assessed based on their profitability and the recommendation error benchmarked at each site against the observed economically optimal N rate (EONR).

The processing of spatial data, including data cleaning (elimination of outliers), spatial interpolation and map generation, was performed in QGIS 3.10 (QGIS Development Team, 2022) using the PAT plugin (Precision Agriculture Tools; Ratcliff et al., 2020) coupled with Vesper 1.62 (Minasny et al., 2005) for kriging interpolation when needed. Analysis, including the moving window regression, statistical modelling for the N recommendations and comparison of the N recommendation methods, was done in R 3.3.0 (R Core Team, 2022).

Field program and experimental design

A large program of OFE was conducted between 2018 and 2021 as part of a multi-agency research project entitled ‘Future Farm’. A total of twenty-one field trials were established in commercial fields of wheat and barley covering a range of soil types and weather conditions across Western and South Australia, Victoria and New South Wales (Table 1). The main goal of the trials was to enable the assessment of crop response to N application and the observation of optimal N rates in a spatially distributed manner; that is, covering the expected range of variation in crop performance within each field. Second, it sought to provide a range of crop N availability scenarios from which field calibration data could be collected for the development of sensor-based N recommendation approaches.

Table 1 List of sites used in the program of field trials

The fields were implemented with a strip trial comprising adjacent strips of an N ‘rich’ and an N ‘zero’ treatment (Fig. 1). The strips were positioned to run across ‘management zones’ determined through analysis of available spatial data such as previous yield maps, remotely sensed imagery and electromagnetic and gamma soil survey data when available. There was some variation in the specific design of each trial depending on the farmer equipment and farmer preference at each site. For example, in Fig. 1, as for the majority of trials, the N strips were 36 m wide, equivalent to three header widths; in some other trials, strips were equivalent to one or two header widths (12 or 24 m). Typically, crops were sown and grown according to normal farmer practice. At, or soon after sowing, the N ‘rich’ strip was established with N applied at approximately twice the normal farmer rate to create an environment of non-limiting N supply. Also, an N ‘zero’ strip was established with no additional application of N beyond that applied at sowing; the remainder of the field was fertilised according to normal farmer practice with the area adjacent to the strips used as the ‘normal field’ treatment.

Fig. 1
figure 1

Example of an N strip experiment and point sample location layout in a 357-ha wheat field near Kalannie-WA, 2019

Field data were collected at various stages during the season, starting prior to sowing when soil samples were collected from the top 0.3 m depth layer at targeted locations along the length of each strip (Fig. 1) for general soil characterisation (e.g., the analysis of soil texture, organic matter, pH, etc.) and analysis of mineral N; these analyses were undertaken in an independent commercial laboratory. There were between 15 and 21 sample locations per trial (5–7 per strip), depending on the field size. At Zadoks growth stage 31 (or GS-31; Zadoks et al., 1974), the trial was scanned with a sensing system for measurements of various vegetation indices and machine vision crop feature extraction (see details below). At the same time, plant samples were collected from the same locations as for the pre-sowing sampling. For each plant sample, two plant rows of 1 m length were cut at the ground level and analysed for dry biomass and plant N concentration. Prior to harvest, plant samples (two rows of 1 m length) were collected near the same targeted locations for plant and grain analysis of dry biomass and N concentration. The fields were harvested using machines equipped with onboard yield monitors and GNSS (Global Navigation Satellite System) guidance. In most cases (in 15 out of the 21 trials), the harvester was also equipped with an onboard calibrated grain protein sensor. The protein data from grain sampling was used for the analysis of trials that did not have onboard protein sensors available.

Site-specific crop response analysis

For the site-specific assessment of the crop response to N application, a moving window regression analysis along the length of each strip trial was conducted using the harvest data. For each trial, a circular buffer of 50 m radius moved along the strip length in increments of 10 m. In each 10 m increment, the harvest point data within the buffer were selected for a local regression analysis. The 50 m radius of the circular buffer was enough to encompass data from the two N strips and from an adjacent field area. The harvest point data within each window comprised information on grain yield derived from the onboard yield monitor data, after cleaning of the dataset, joined with the grain protein information corresponding to the closest location within the same trial strip for which it was available (either from the onboard sensor or from manual sampling). These grain yield and protein information were used to determine partial profit at each yield monitor data point. Partial profit was calculated as the gross income minus the expenditure on N fertiliser based on average grain and fertiliser prices between 2018 and 2021 (Table 2); the gross harvest income accounted for grain protein premiums as described in Table 2. This local harvest data (i.e. within the moving window area) was then used in a regression analysis using a quadratic function. The EONR, the N rate that maximised partial profit, was identified from the partial profit response curve for each moving window and was then used in the comparative analysis of the various N recommendation methods tested in this study. Figure 2 demonstrates the approach on the experimental trial established at a site near Booleroo Centre in the ‘far north’ grain-growing region of South Australia (SA) in 2021, with one of the ‘windows’ highlighted, showing the harvest data points used in generating response functions to changing N rate. Each field trial had an average of 112 windows for the generation of crop response functions and EONR observations (making a total of 2352 response observations).

Table 2 Average grain and nitrogen prices (2018–2022) applied in partial profit analysis
Fig. 2
figure 2

Experimental strip design implemented in a 100-ha barley field near Booleroo Centre, SA, 2021, and a moving window analysis of crop response to N application. The graphs illustrate the local response curves and optimal N rates (ONR is depicted as red crosses) generated from data within one of the moving windows along the length of the strip trial

In order to minimise management disruption and increase farmer engagement in this work—in line with the participatory nature of this research between farmers and researchers—a strip design was chosen (as described above) as opposed to a more randomised and replicated trial, which could potentially allow more N rates to be used and better characterisation of crop N responsiveness. As a consequence, three N rates were used (the N ‘rich’, N ‘zero’ and farmer practice), but with many harvest data points available for each rate (Fig. 2). Likewise, a simple quadratic function fitted to partial profit data was used rather than more established yield response functions such as the Mitscherlich (1924) model because of the limitation posed by the restricted number of N rates available. For the purpose of this work, this experimental strategy was regarded as acceptable, especially in light of the fact that this simple quadratic function can capture profit decline that may happen at high N rates (example in Fig. 2). It is also important to note that these trials were designed in close collaboration with the farmer as an aid to fine-tuning N decision-making in near real-time in the field in which the trial was conducted. In most cases, the collaborating farmer was already using these strip trials on their own. Thus, the trial designs were maintained, with minimum requirements imposed by the project team. As such, the trials were not aimed at ‘discovery science’ or intended to inform generalised recommendations, although numerous such trials within a farm or region could contribute to these (Bramley et al., 2022).

Input variables for digital N recommendation methods

Digital variables from private and public sources were collected at each site (Table 3) and used in different combinations for the N recommendation frameworks detailed below (Table 4). Since the recommendations aimed for a mid-season decision, only variables available until GS-31 crop stage were used. These variables characterise the field history, in-season crop status, soil and landscape features and weather. In-season crop scanning in each strip was conducted at GS-31 with the use of a Crop Circle sensor (wavelengths 670 mm, 730 and 780 mm, model ACS-430, Holland Scientific, Lincoln—USA) for NDVI (Normalised Difference Vegetation index; Rouse, 1973) and NDRE (Normalised Difference Red Edge; Fitzgerald et al., 2010) measurements. The sensor was installed on an all-terrain vehicle 1 m from the ground (approximately 0.5–0.7 m from the top of the crop canopy) at nadir position, collecting data every second, which was on average, every 2.7 m along the length of the trials. Machine vision data were also collected and used to extract crop features for use in the N sufficiency recommendation approach (detailed below). An RGB action camera (model FDR-X3000, Sony, Tokyo—Japan) was installed alongside the Crop Circle sensor (1 m from the ground level) and oblique geotagged RGB images were collected every second. Each frame was post-processed using OpenCV 3.4.2 (Bradski, 2000) in Python to segment plant pixels and extract colour indices and features in the red, green and blue colour space (RGB). The colour indices and canopy features generated were the normalised green red colour difference index (NGRDI) and normalised red blue colour difference index (NRBDI), which have reported correlations with biomass and plant N concentration (Jiang et al., 2019); canopy cover was assessed as a surrogate for plant size. Readers are referred to McCarthy et al. (2022) for more details on the machine vision variable extraction methodology.

Table 3 List of variables used for digital N recommendations
Table 4 List of N recommendation methods evaluated

Publicly available variables were also collected (Table 3). Remote sensing products were obtained using the Google Earth Engine Platform (Gorelik et al., 2017) including Landsat8, MODIS and Sentinel2 imagery (NDVI and NDRE were calculated using 664, 782 and 832 nm wavelengths). Each site was characterised for a range of soil, landscape and weather variables (Table 3). In the case of land surface temperature, model parameters (phase and amplitude) from a sinusoid function fitted to a land surface temperature dataset were retrieved using the approach of Jakubauskas and Legates (2000). A brief description of all variables used is provided in Table 3. The final database comprised observations in each moving window in which the target variable was the EONR and the predicting variables were those described in Table 3. For each predicting variable, the average value within the area of the moving window was used.

Fertiliser N recommendation methods

The thirteen methods for making N recommendations tested in this work are described in this section. They include four reference (or benchmark) methods and nine alternative methods for N recommendation in the scope of precision/digital agriculture (Table 4). The following sub-sections categorise these methods by their general approach.

Reference and benchmark N recommendation methods

There were four reference or benchmark methods evaluated (the top four methods listed in Table 4). The EONR observed from the experimentation (ex-post) was the absolute reference against which all other recommendations were compared. The ‘Max Yield’ method was an additional ex-post reference method for comparison, which was the observed rate that maximised grain yield. A third reference was the ‘Farmer’ approach. This recommendation varied from farm to farm, but it generally involved a holistic assessment, through various methods, of the probable yield potential and N supply by the system, combined with considerations of economic factors. These N decisions were made either by the farmers themselves or in conjunction with their consultant agronomist. Thus, the decision was also influenced by the experience and intuition of both farmer and agronomist, and by other subjective factors such as the farmer risk profile. The recommendation was implemented at a single dose across the field.

The last reference method was the ‘Simplified Mass Balance’ which represented a standard commercial practice in which limited on-farm information is available. To calculate the total N demand, publicly available information of historic water-limited yield potential at the local region level was used, sourced from the Yield Gap Australia website (CSIRO, 2021) and target grain N concentrations equivalent to 13% (wheat) and 11% (barley) grain protein content. The soil N supply was set to 65 kg/ha, based on the average information obtained from the pre-sowing soil sampling across the trials; such generalised soil N supply information was used (as opposed to the field average) to simulate a best guess by the farmer/agronomist when soil samples are not collected in the field, as is the common practice amongst producers. The final N recommendation was calculated as the N demand minus the soil supply divided by a fertiliser N grain recovery of 0.3 (Eq. 1). The fertiliser recovery factor represents the relative amount of applied N that ends up in the grain and was calculated as the difference in grain N removal (in kg/ha) between the ‘normal field’ and the ‘zero’ strip relative to the difference in applied N between these two areas and averaged across all trials (Eq. 2).

$$N\;rate = \frac{{(Yield\;Potential \times Protein\;factor) - Initial\;Soil\;N}}{{Fertiliser\;N\;Grain\;Recovery}}$$
(1)
$$Fertiliser\;N\;Grain\;Recovery= \frac{{GN}_{field}-{GN}_{zero}}{{Applied\;N}_{field}-{Applied\;N}_{zero}}$$
(2)

where, ‘Yield Potential’ was the historic water-limited yield potential in kg/ha at the local regional level; the ‘Protein Factor’ was 0.0228 for wheat (using 13% as the protein target and 5.7 as the conversion factor from protein to N; Dalal et al., 1997) and 0.0176 for barley (using 11% as the protein target and 6.2 as the conversion factor from protein to N; Dalal et al., 1997); ‘GNfield’ and ‘GNzero’ were the grain N removal in kg/ha in the ‘field’ and ‘zero’ areas; and the ‘Applied Nfield’ and ‘Applied Nzero’ were the total N rate in kg/ha applied to the ‘field’ and ‘zero’ areas.

Digital methods based on yield prediction

Digital methods based on a ‘mass balance’ N recommendation were evaluated (5th and 6th methods listed in Table 4). Inspired by the NFOA approach (Raun et al., 2005), they were based on the prediction of crop yield using crop sensing and on information from an N ‘rich’ reference area (see below). The mid-season N requirement, assessed at GS-31 stage, was calculated as the predicted yield in the N ‘rich’ treatment minus the predicted yield in the adjacent field area, converted into total grain N (based on 13 and 11% grain protein content as targets for wheat and barley) and divided by the fertiliser N grain recovery (Eq. 3). The fertiliser N grain recovery was the same as the one used for the ‘Simplified Mass Balance’ method (Eq. 2). The total N recommendation rate was given as the mid-season N requirement plus any N applied previous to GS-31.

$$N\;rate=\frac{{(Y}_{Rich}-{Y}_{field}) \times Protein\;Factor}{Fertiliser\;N\;Grain\;Recovery}+Applied\;Initial\;N$$
(3)

where, ‘Yrich’ and ‘Yfield’ were the yield predictions in kg/ha made at GS-31 stage at the ‘rich’ and ‘normal field’ area; the ‘Protein Factor’ was 0.0228 for wheat (for a 13% protein target) or 0.0176 for barley (for a 11% protein target); the ‘Fertiliser N Grain Recovery’ was 0.3 (Eq. 2), meaning 30% of applied N fertiliser is expected to get into grain yield; and the ‘Applied Initial N’ was any N applied in kg/ha prior to GS-31.

There were two variants of this sensor-based N recommendation method regarding how grain yield was predicted: the ‘Yield Prediction (LM)’ method and the ‘Yield Prediction (RF)’ method. For the ‘Yield Prediction (LM)’ method, grain yield was predicted from the mid-season (GS-31) Crop Circle NDVI divided by growing degree days at the date of sensing using a linear model (LM) which is a similar approach to the one used by Raun et al. (2005). For the ‘Yield Prediction (RF)’ method, a Random Forest (RF) regression model using a range of digital variables was used to predict grain yield. These two variants were designed to allow the assessment of the potential benefit of machine learning when the decision is underpinned by a mechanistic agronomic framework (i.e., the mass balance approach). To select the variables for the RF model, a recursive feature elimination (RFE) process was used to rank all available variables listed in Table 3 based on their importance. Then, a variable selection was made by eliminating the least useful variables. Variables considered to be too similar to each other were also eliminated. For example, when both NDVI and NDRE had similar importance based on the RFE, the least useful one was eliminated. Generally, the manual filtering sought to select only variables that were considered to provide information on different agronomic aspects of the system, reducing the likelihood of overfitting the model. In both methods for yield prediction, the models were generated separately for wheat and barley. Also, for every field where yield was being predicted, only data from the other remaining fields were used for modelling.

Digital methods based on crop response prediction

Sensor-based methods following a ‘crop responsiveness’ framework were also tested in this study (7th to 10th methods listed in Table 4). These methods were based on the general idea behind the Crop Circle algorithm, which uses the Holland and Schepers (2010) approach. The sensor was used in combination with the on-farm N strips to make observations of crop response to N application near the GS-31 crop stage. Following the same process as for the local response analysis with harvest data (Fig. 2), for each moving window along the length of the trial, a quadratic relationship was generated between the sensor-based vegetation index and N rate and this function was used to assess the optimal N rate; that is, the N rate that maximised the vegetation index. The main limitation of this method is that it assumes that the crop response function observed mid-season is equivalent to the one observed at harvest, which may or may not be appropriate; certainly, it ignores outstanding weather events after the mid-season scanning. There were four variations of this approach testing the NDVI and NDRE as input data when obtained either from the Crop Circle sensor or from the Sentinel 2 imagery.

Digital method based on N sufficiency

The last agronomic approach used to frame a sensor-based recommendation in this study was the ‘N sufficiency’ approach (11th method listed in Table 4). Despite the sound agronomic theory behind it, the implementation of such a framework through a sensor-based approach has been much less common compared to the NFOA or the Holland and Schepers algorithms (Colaço and Bramley, 2018; Franzen et al., 2016). The approach used here was first presented in Colaço et al. (2022). This framework sought to guarantee adequate plant nutrition by monitoring plant nutrient levels and using a nutrient concentration threshold to calculate the N requirement for a mid-season management intervention. To account for nutrient dilution in the plant, the threshold, also referred to as the ‘critical nutrient level’, varied according to crop biomass; that is, the critical N level was lower for larger biomasses. Overall, there were two main components to this approach. The first involved sensor-based predictions of crop biomass and plant N concentration (a) and the second made use of (b) a predefined dilution curve to determine the critical nutrient concentration for each biomass level.

  1. a)

    Biomass and N concentration sensor estimates: For the purposes of this study, Machine Vision (MV) technology was used for the biomass and plant N concentration assessment. Biomass and N concentration models were generated from multiplicative combinations of MV variables (e.g., Gebremedhin et al., 2019) and the sampled biomass and N concentration data using linear regression. Models were constructed for the combined wheat and barley data; preliminary investigation showed no benefit for the predictions when models were made independently for each crop. The biomass model was generated using a combination of NGRDI and canopy cover with R2 = 0.55 and RMSE = 374 kg/ha compared with the sampled data. The N concentration model was generated using a combination of NGRDI and canopy cover with R2 = 0.28 and RMSE = 1.0% compared with the sampled data. Another option for mapping crop biomass at scale would be the use of LiDAR, as described by Colaço et al. (2021b) and a further, arguably more convenient, alternative to ground-based approaches for biomass estimation might be that based on freely available remotely sensed imagery (Perry et al., 2022).

  2. b)

    Dilution curve: Fig. 3 shows the N dilution plot used for this study which was derived from the sampled plant biomass and N concentration data collected across the project trials added with available data from previous research conducted in a similar Australian environment (Fitzgerald et al., 2010). Dry biomass was plotted against N concentration for each sample, and boundary lines were generated using exponential decay functions; the upper function was used as the target (threshold) N concentration for each biomass level.

The N requirement was calculated as the difference between a target mid-season crop N uptake and the current crop N uptake divided by the fertiliser N crop recovery (Fig. 3 and Eq. 4). The target crop N uptake was given by the estimated biomass in the N ‘rich’ area multiplied by the critical N level for such biomass. The ‘current’ crop N uptake was simply the estimated biomass multiplied by the estimated N concentration in the ‘normal field’ area (i.e., adjacent to the strips). The fertiliser N crop recovery was the relative amount of applied N expected to be absorbed by the aboveground plant biomass. It was calculated as the difference in plant N uptake between the ‘normal field’ and the ‘zero’ strip divided by the difference in applied N in these areas (similar to Eq. 2 but using mid-season plant N uptake instead of grain N removal). The plant sampling data were used for this calculation and the recovery factor was given as the average across all trials. The total N recommendation rate was given as the mid-season N requirement plus any N applied previous to GS-31.

$$N\;rate= \frac{{(Biomass}_{rich}\times Critical \;N \;Level)-({Biomass}_{field}\times {N\%}_{field})}{Fertiliser \;N \;Crop \;Recovery}+applied \;initial \;N$$
(4)

where, ‘Biomassrich’ and ‘Biomassfield’ were sensor-based estimates in kg/ha of aboveground dry biomass at the N ‘rich’ and ‘normal field’ areas (see above); the ‘Critical N Level’ was the nutrient critical level associated with the estimated biomass at the ‘rich’ area obtained through the predefined dilution curve (Fig. 3); the ‘Fertiliser N Crop Recovery’ was 0.35 meaning 35% of applied N fertiliser gets absorbed into the above ground crop biomass; and the ‘Applied Initial N’ was any N applied in kg/ha prior to GS-31.

Fig. 3
figure 3

The N dilution plot guiding the N sufficiency recommendation. The blue dots represent plant data collected across all trials in the project including mid-season and harvest plant sampling. The smaller grey dots are samples collected from a previous separate study (Fitzgerald et al., 2010). The black dot and the red cross are hypothetical field data representing a current crop assessment and the target for fertilisation with the latter being based on the biomass assessed from the N ‘rich’ area associated with a target plant N concentration for such biomass

Digital method based on an empirical, data-driven approach

The final N recommendation method evaluated was based on an empirical data-driven (DD) approach (last two methods listed in Table 4). Contrary to other frameworks, this method did not rely on any agronomic mechanistic calculation of the N rates; likewise, digital variables were not modelled to predict crop parameters to enable this sort of calculation. Rather, the available digital inputs were modelled directly against EONR using RF regression. In essence, this empirical approach provides a recommendation by assessing current site and season characteristics and relating these to conditions for which optimal N rates are known. Therefore, it predicts the final EONR using data that are available until the GS-31 crop stage (Table 3). Note that the data from soil and plant sampling mentioned earlier were not used for this approach. Only data from publically available sources or those collected using digital sensors on the field were used for this DD method (Table 3). Aside from the data listed in Table 3, a range of OFE variables derived by combining the crop sensing data with the strip information were used as potential predictors of EONR. The following OFE variables were generated at the moving window basis: the vegetation index observed independently in the ‘rich’, ‘zero’ and ‘field’ areas; the vegetation index ratio between ‘rich’ and ‘field’ areas, and between ‘field’ and ‘zero’ areas; and the N rate recommendations derived from the ‘Response Function’ sensor-based methods used as predicting variables. The OFE variables were derived using both the NDVI and NDRE sourced from the Crop Circle sensor and Sentinel 2 satellite. Similar to the ‘Yield Prediction (RF)’ method, RFE was used to rank the variables which were then filtered down to a final set of predicting variables. Again, in the case of predicting variables that were considered too similar to each other, only the most useful one was used; this manual filtering was especially important when selecting a minimal subset of the available OFE variables described above.

The DD method was implemented in two variations: a ‘data abundant’ and ‘data limited’ mode to simulate different levels of data availability. These scenarios were developed by changing the type of model validation. The ‘data abundant’ scenario represented a best-case scenario where the site and season for which the model was being applied were well represented in the data used to build the model. In this case, a 50–50% random split in the data was used for training and validation of the model; in effect, this meant that data from all sites and seasons were used to build the model. Because data from the field where the recommendation is being made was used to train the model, it is highlighted that this method represents a hypothetical scenario in which abundant historical information about the field in which the decision is to be made is available. Therefore, this approach sought to assess the upper potential of performance for an empirical DD approach. The ‘data limited’ scenario represented a scenario where the data used for training and validation of the model had to rely solely on data from other sites. In other words, it was simulating a scenario where very little was known about a particular site. Overall, this approach represented the actual performance of the DD method using the field data that was available for this research. In both situations, wheat and barley models were created independently of each other.

Comparative assessment of N recommendation methods

The recommendation methods described above were evaluated at each moving window in the on-farm experiments; note that recommendations and the assessment of the different methods were therefore restricted to the strip area (‘rich’, ‘zero’ and the adjacent normal field area). Thus, they were not implemented or assessed across the entire field. All digital N recommendations, along with the EONR and ‘Max Yield’ reference methods, were generated both site-specifically, where each moving window had a specific N rate recommendation, and uniformly, by averaging the site-specific rates across all moving windows of a same field. The ´Farmer´ and the ‘Simplified Mass Balance’ recommendations were evaluated only as a uniform rate application.

The main assessment process for these recommendations was designed to provide a comparison between the ability of the different methods to predict N requirement across the sites, in terms of both average accuracy and average profitability. The EONR generated on a site-specific basis from the field response trials were the absolute reference against which all other recommendations were evaluated. Thus, the error for each recommendation was calculated based on the site-specific EONR. Also, the expected profitability of each method was obtained by inserting each recommended N rate into the respective site-specific profit response function at each moving window (Fig. 4). Results were aggregated for each trial based on the root mean squared error (RMSE) and the average partial profit across the field. To facilitate the interpretation of results, the average partial profit of each recommendation method was normalised against the EONR partial profit achieved at each site, producing a ‘normalised partial profit’ (NPP), with the EONR having an NPP = 1. This was considered important given the differing production potential of the various sites. Results were then averaged across all trials and plotted into an error (RMSE) by profit (NPP) biplot.

Fig. 4
figure 4

The process of building a database with observations of economically optimal N rates (EONR) using local response curves along the length of a N strip trial (1), and the assessment of the error and expected profitability of different N recommendation methods using the local response function (2). The illustration of the strip experiment is out of scale and values in the tables are hypothetical

Due to restricted availability of data at some sites, the N sufficiency method could only be implemented in eight trials, rather than all 21. To allow the comparison with the other methods which were assessed based on all trials, the N sufficiency average results regarding RMSE and NPP were adjusted based on their value relative to the ‘Farmer’ results. For example, if the average RMSE for the ‘N Sufficiency (MV)’ method across the eight sites was 35 kg/ha, and the ‘Farmer’ average RMSE was 30 kg/ha, the ‘N Sufficiency (MV)’ result was given a relative value of 1.16. To plot this result across all trials, if the average ‘Farmer’ error for all trials was 40 kg/ha, then the adjusted error for the ‘N Sufficiency (MV)’ method would be 40 × 1.16 = 46.4 kg/ha.

Results

In general, the results highlighted the strong performance of the collaborating farmers in managing N fertilisation (Fig. 5; Table 5). Their uniform management achieved 94% of the maximum profitability represented by the site-specific EONR and almost reached the optimum management for a uniform application; the difference between the farmer application and the uniform EONR was around 2% (Table 5). The fact that the farmer recommendation was near optimum at the field level means that one route to improve N management for these high performing farmers is to augment the spatial resolution of their management; that is, to implement effective N decisions at the site-specific level. The ‘Farmer’ recommendation also outperformed the ‘Simplified Mass Balance’ approach and most univariate, sensor-based, digital N recommendation methods.

Fig. 5
figure 5

Error by profit biplot showing the average results of various N recommendation methods across 21 large scale on-farm trials

Table 5 Error and profitability average results of various N recommendation methods

The only approach that showed potential to exceed the farmer management performance was the DD (abundant) method, which did not rely on a mechanistic calculation of N rates; instead, this approach uses a ML model trained against empirically derived EONR. In the ‘data abundance’ scenario, when the current field and season factors that drive crop N requirement are well represented in the model training dataset, the DD recommendation improved the farmer N management profitability by 5% (or A$47/ha when a baseline for NPP of A$1000/ha is used). Arguably of greater importance is that it also reduced the error from 42 to 15 kg/ha suggesting that it offers considerable value to those concerned with the risk associated with N decision-making. On the other hand, in a situation of limited data, that is, when the model needs to rely on datasets that do not fully cover the current field scenario, the DD (limited) method was similar to the ‘Simplified Mass Balance’ approach and to other mechanistic sensor-based methods. This result highlights the value of input data specific to the field of interest.

The selected variables for the DD EONR prediction with RF are presented in Fig. 6, which makes clear the importance of those derived from the OFE for the EONR prediction. These were the Sentinel NDVI from each N strip (both ‘rich’ and ‘minus’) and the N rate associated with maximum mid-season Sentinel NDVI. Note that vegetation index ratios between strips, which are commonly used in traditional sensor-based approaches (e.g. Raun et al., 2005) were also tested, but they were shown less useful than the ‘Resp Func’ recommendations when used as predicting variables based on the RFE results; thus, only the latter were used as predictors.

Fig. 6
figure 6

Variables selected for EONR (a) and yield (b) prediction ranked based on their relative importance for the Random Forest Model using Recursive Feature Elimination. Refer to Table 3 for an explanation of the variables

All digital recommendation methods framed around mechanistic agronomic N decisions (‘Yield Prediction’, ‘Response Function’ and ‘N Sufficiency’) scored near or below the Farmer performance. Because of their large recommendation error—the RMSE for these methods was at least 40 kg/ha—there was no benefit in implementing them site-specifically as opposed to uniform management. Indeed, in some cases, using a uniform application reduced the error substantially. Overall, this result suggests that methods in which the recommendation error is expected to be large are better implemented as the average for the field instead of site-specifically.

The mechanistic sensor-based methods that got closer to the Farmer N management performance were the ‘N Sufficiency (MV)’, the ‘Yield Prediction (RF)’ and the ‘Response Function (NDVI Sent)’, all at the uniform application level (Fig. 5). The Response Function approach was sensitive to the type of input data; that is, some combinations of vegetation indices and their data sources may match a profit response function better than others. In the case of this study, the sensor-based response function that most resembled the final profit function was the one derived from Sentinel 2 NDVI. Since Sentinel data can be accessed for free, the fact that it out-performed the methods which used proximal crop sensors represents an important result for farmer adopters of such PA technologies. The ‘N sufficiency’ method, a less common sensor-based approach, had a similar performance to the ‘Yield Prediction’ methods.

Enhanced analytics based on machine learning failed to improve the performance of a mechanistic sensor-based method underpinned by a mass balance recommendation. This can be assessed by comparing the two variants of the ‘Yield Prediction’ approach. Despite the more sophisticated yield prediction algorithm used by the ‘Yield Prediction (RF)’ method, there was little benefit to the final N recommendation as compared to the simple regression technique used by the ‘Yield Prediction (LM)’. The use of multiple variables coupled with the RF algorithm did improve the R2 of the yield prediction from 0.22 to 0.42 (data not shown) as compared to the simple regression. However, that did not translate into a practical benefit for the N recommendation; the difference in the recommendation error between the two methods was less than 1 kg/ha in RMSE (Table 5). Of interest, the selected variables for yield prediction with RF are available in Fig. 6.

Discussion

An extensive field program of large scale on-farm trials was conducted across Australia to assess a range of sensor-based mid-season N management strategies. Methods were assessed based on the recommendation error (RMSE) against observed optimal N rates in the field and the associated impact on crop production profitability. This approach has not been common for the evaluation of site-specific N management strategies in PA research. In the review of Colaço and Bramley (2018) on available sensor-based N decision tools, none of the studies reviewed measured the accuracy of the N rate recommendation—it is, in fact, ironic that research on ‘precision’ agriculture did not assess the ‘precision’ (or the error) of management practices. Instead, results of sensor-based N application have been mostly reported based on their impact on yield and or input consumption against farmer management, which may lead to biased evaluations of the technology, depending on how good the current farmer practice is. For a pragmatic investigation, measuring the error of a given recommendation is necessary, but the method’s profitability should be just as crucial, as it is the relationship between error and profit that can be used to distinguish differences in risk between techniques. Conscious of this, the main output of the analytical framework used in this work was precisely an ‘error by profit’ biplot (Fig. 5) showing the performance of the investigated recommendation strategies.

The results show that as the performance of the different methods approached the optimal input N rate, the impact on profitability reduced. Note that profitability increased with higher N recommendation accuracy (lower RMSE) but the rate of increase in profitability reduced as the methods became more accurate (Fig. 5). This reflects the flatness around the maxima of typical response curves describing the relationship between production and input use (e.g. Figures 2 and 4). The implications of this phenomenon to PA, and more specifically to site-specific nutrient management, have been thoroughly discussed by Pannell (2006, 2017) in terms of production profitability. In summary, this allows for a wide range of N rates near the optimal level to have a similar impact on crop performance and profitability, thus potentially limiting the benefit from more accurate N recommendations. This also explains the generally good performance—between 85 and 95% of maximum partial profit—of most recommendation methods. Moreover, if the cost of technological solutions for N management is expected to increase with greater recommendation accuracy, it could be inferred that there may be thresholds in recommendation error below which a change may not be economically justified. Nonetheless, the reduced error associated with the data-driven recommendation equates to greatly increased confidence that the farmer can have in their N decision if the data-driven recommendation is followed. It is also worth noting that the production profit analysis used here does not consider the potential environmental implications of flat N response functions. Such flatness (around the maxima) of typical response curves, coupled with the ability to identify an optimum application rate with low uncertainty (error), should mean that there is flexibility in the system to optimise N recommendations that consider environmental and resource use efficiency impacts with minimum penalty to crop profitability (Rogers et al., 2016). Such flexibility will no doubt be important in future N decision tool developments.

Considering the above, and despite the diminishing gains in profitability from more accurate N management, the analysis presented here for Australian grain production showed that there is an opportunity to improve the profitability of farmer N management by up to 6% (or around A$60/ha). It is important to note that this result was obtained for a cohort of farmers who are leaders in their field. As such, the historic average yield for the field trials in this study (based on the yield monitor data) was around 1.7 times the regional average yield (ABARES, 2022). Likewise, whilst the regions covered by this work had an average yield gap of around 60% ([1-(actual yield/yield potential)]; CSIRO, 2022), the collaborating farmers showed an average yield gap of around 15% based on the actual obtained yield and the regional yield potential; data not shown. On the assumption that farmers operating a little further from the optimum, compared to the collaborators of this research, have larger yield gaps than 15%, it could be speculated that the impact of adopting the data-driven approach may be increased for ‘average’ farm businesses compared to that reported here, due to the greater potential for improvement. On the other hand, the overall gains from optimized N management might be constrained by other manageable factors (for example pH, trace element deficiencies, etc.) that have not been optimized on less well managed farms. Somewhat perversely, the present results suggest that in order to assess the potential average impact of different N management practices, future studies will benefit from collaboration with farmers whose management is not as close to the ‘top of the tree’ as those involved in the present work.

The univariate sensor-based N systems evaluated in this study that followed mechanistic decision frameworks can nearly match the performance of farmers at the uniform management level. Therefore, the results for these technologies suggest they can be regarded as useful in the automation of decisions to help improve management for farmers that currently operate at lower average N management profitability. However, such sensor-based approaches appear insufficient as drivers of site-specific N management strategies for underpinning adoption of PA. Their inherently large error does not warrant use in variable-rate fertiliser interventions, and they are better implemented as a uniform average rate across the field. This finding is contrary to much PA research on the use of current commercial sensor-based N tools supporting approaches based on the NFOA (e.g. Li et al., 2009; Ortiz-Monasteiro & Raun, 2007; Sapkota et al., 2014; Tubaña et al., 2008a, b) or on the Holland and Schepers (2010) algorithm (e.g. Stamatiadis et al., 2017) as effective site-specific management strategies. Not surprisingly, those studies, and many others reviewed by Colaço and Bramley (2018), did not assess the recommendation error between site-specific and uniform sensor-based N application. It is also noted that there is considerable irony in that, whereas such on-the-go sensor-based approaches were developed with the primary intention of supporting continuous variable rate application, their use generally relied on crop response information derived from average values obtained within plots or strips (see references above and those available in Colaço & Bramley, 2018). This issue was compounded by the use of ramped strips (Roberts et al., 2011). It is recognised that the reason why this study was able to account for within strip variability in crop response was because the analysis was restricted to the trial area, as opposed to making recommendations across the entire field; as such, specific crop response information (using the ‘rich’ and ‘zero’ areas) was available for the recommendation in each moving window. In a real application scenario, in which recommendations are needed across the entire field area, the use of strips allow, at best, an average response to be used for each zone. To overcome this issue, future work should investigate the use of available covariate data to estimate crop response away from the strip trial area. Another alternative would be the use of more replicated trials (e.g. Bullock et al., 2019; Cook & Bramley, 1998), in which crop response information varies even within the same management zone, and is available nearby the point where the recommendation is being made. In addition, a more replicated design could potentially contribute to more N rates being used, thus potentially improving the generation of local crop response functions (Trevisan et al., 2020).

The attempt to improve a sensor-based mechanistic N rate recommendation (based on a mass balance approach) through enhanced analytics—by using a multivariate digital input and ML for crop parameter prediction—was not effective, which runs counter to the general assumption where ML algorithms are presumed to be necessarily better than simpler modelling approaches. Results from this and previous research (Colaço et al., 2021a) suggest that the limitation for mechanistic fertiliser recommendations may lie not in the analytics used, but in the agronomic decision framework itself. Despite the better prediction of yield potential using ML, the benefit for the N rate recommendation derived from it was negligible. Such results suggest that some inherent limitations of simplified N rate calculations constrain their efficacy. The assumptions around expected grain quality and fertiliser N recovery, for example, appear to prevent them from generating accurate fertiliser recommendations at the site-specific level, likely also due the fact that variation in other factors affecting crop performance, such as available soil water, are ignored in such simple approaches. It has been noted that a large effort in the research community (see reviews of Chlingaryan et al., 2018; Liakos et al., 2018; Mishra et al., 2016; van Klompenburg et al., 2020) to improve crop parameter prediction through ML, especially for crop yield prediction, has been undertaken without a critical consideration of the limited ability of current agronomic decision tools to make appropriate use of such information. This and previous research (Colaço et al., 2021a) suggests a shift in focus in which ML and OFE are used as the means for the agronomic decision itself. The alternative, which was implemented in this work, used available multivariate digital inputs, coupled with OFE information and ML, to train data-driven N decision algorithms empirically. That is, rather than focusing on the prediction of yield potential to enable a mass balance N rate recommendation, the ML model predicts optimal N application rates directly from the digital input data by assessing current field and season variables and relating them to conditions for which optimal N rates are known.

The range of digital input variables used in this study, covering various factors driving crop N requirement, coupled with the RF algorithm, successfully identified the optimal N rate at the site-specific level and for variable scenarios when sufficient historical data was available. From the group of variables tested in the DD method, those derived using information from the strip response trial featured as the most crucial predictors. Whilst this work did not directly assess the value of soil moisture information—as did Lawes et al. (2019) and Colaço and Bramley (2019)—which is presumed to be of high value for N management decision making in dryland farming systems, it can be considered that OFE variables indicating a crop response to N application incorporate the effects of soil moisture up to the time of measurement. Other important predicting variables worth investigating might be those related to weather forecasting and crop development beyond the time when the mid-season N decision is made, the lack of which are probably important contributors to the recommendation error (15 kg/ha of RMSE) of the DD method in this work.

The idea that a ML algorithm could replace a complex agronomic mechanistic model to arrive at effective N recommendations was first proposed by Lawes et al. (2019). In that case, an RF model was trained using crop and soil parameters, such as leaf N concentration and soil moisture status, against optimal N rates derived from a simulated APSIM (Agricultural Production Systems Simulator; Holzworth et al., 2014) model dataset. The work of Colaço et al. (2021a) further developed that approach by training the empirical model using digital variables such as sensor-based vegetation indices and in situ soil moisture sensor data as model input. However, the latter study was based on a plot-based experiment at a single site. The present work demonstrates the potential of a data-driven empirical model in real field scenarios using digital variables that are accessible from on-farm and public data sources. The recommendation error obtained in this study (15 kg/ha of RMSE) is consistent with results found in Colaço et al. (2021a) and Lawes et al. (2019). These current results also show the importance of OFE information and the use of multivariate predictors, which included weather and landscape variables, for the DD decision model. However, due to the empirical nature of this approach, and the minimal agronomic domain knowledge embedded in it, the model is crucially reliant on an extensive historical dataset. The success of the approach relies on availability of sufficient historical on-farm data that reflect crop response to inputs in such a way that it covers the range of variation in expected field and season conditions in which the model will be used.

A key enabler for the construction of such a digital database is the implementation of OFE. More specifically, on-farm trials should be automated and embedded into normal farm operations so that large crop response datasets can be collected effortlessly across multiple farms and seasons. To achieve that, experimental designs can be implemented using variable-rate controllers and GNSS machine guidance systems; the fields can be monitored during the season using remote and proximal sensing technology and then harvested through onboard monitoring systems. A system in which farmers share OFE data across larger communities may also play a role in building the necessary database for empirical data-driven decision tools.

While this work used simple analytics and OFE trial designs, future work should continue to investigate and test other field approaches; for example, more replicated trials and/or approaches that allow extrapolation of crop response information away from the trial area, as mentioned above. Investigation of more sophisticated methods to analyse OFE spatial data is also warranted; for example, those that account for spatial autocorrelation in the data such as geographically weighed regression (Evans et al., 2020; Rakshit et al., 2020) or geostatistical methods (Bishop & Lark, 2006; Jin et al., 2021). Nonetheless, in considering such approaches, the trade-off between trial complexity and farmer utility and pragmatism will need careful consideration (Bramley et al., 2022). In relation to the development of data-driven models, future work should also investigate the interaction between the geographical coverage (i.e. local, regional, national, etc.) of the data and derived models, and the amount of data needed to build effective models. Techniques to embed mechanistic knowledge and agronomic decision rules (for example, using crop modelling techniques) into empirical ML algorithms are also worth exploring, as they could potentially reduce data requirements and expand model applicability across more variable scenarios. Overall, this work shows a promising pathway for digital on-farm technology, artificial intelligence and OFE to drive production systems closer towards season- and site-specific economically optimum operations.

Conclusions

This work assessed a range of N recommendation methods, including digital approaches and current farmer practice, using a large program of field-scale trials across Australia over four years. The main lessons learned from this research are:

  • Empirical, multivariate, data-driven methods have potential to reduce the error and increase profitability of fertiliser management, but they are heavily reliant on extensive on-farm digital databases.

  • On-farm experimentation is a critical enabler of data-driven decision tools as it allows the automated collection of large digital datasets of crop response to applied nutrient needed to train empirical ML algorithms. Such OFE should desirably be adopted as a core element of farm business operation to support decision optimisation.

  • Current univariate sensor-based algorithms underpinned by simplistic agronomic decision frameworks can match the performance of farmer management at the field level. However, due to inherently large error, recommendations are better implemented as the average for the field rather than site-specifically. As such, these technologies should be treated with caution as platforms to support variable rate site-specific management.

  • Digital recommendation frameworks that make use of ML techniques but force their output into simplistic agronomic decision frameworks fail to improve accuracy and profitability of fertiliser management decisions.

  • The development of site-specific nutrient management solutions must be conscious of the flatness around the maxima of typical crop response functions limiting the gain in profitability through greater accuracy of fertiliser recommendations.