Evaluating the accuracy and usefulness of commercially-available proximal soil mapping services for grassland nutrient management planning and soil health monitoring

Grasslands play an important role in global food security. However, there are increasing pressures to improve the sustainability of ruminant farming. Precision nutrient management tools (e.g., proximal soil sensors for soil mapping) offer opportunities to improve nutrient use efficiency through spatially-variable nutrient application rate maps. Despite little research validating these technologies on grasslands, commercial companies promote these technologies to grassland farmers. In this study, the accuracy of commercial companies offering these services was evaluated by comparing soil pH, P, K, Mg and SOM measurements derived from conventional soil sampling and laboratory analyses to measurements derived from the commercial operators, across a range of soils that are typical found in UK grasslands. Results showed that soil mapping services utilising gamma-ray spectroscopy (GRS) were not sufficiently accurate to predict soil pH, P, K and Mg on grasslands, and subsequently inappropriate for nutrient management planning for variable rate lime and nutrient application. Conversely, both GRS and visible-near infrared spectroscopy (Vis–NIR) accurately predicted between-field SOM variations in grassland soils but not within-field variation. This study emphasises the need for further research to explore the limitations of, and opportunities for, the universal application of these technologies across different soil types and/or land uses before their commercial application. It is therefore highly recommended that commercially-available soil mapping services are subject to certification, similar to centralised soil testing laboratories, to ensure data are accurate for soil management interpretation. The lack of reliability of such systems risks farmers’ confidence in the value of soil mapping, which could severely hinder future adoption of potentially valuable technologies.


Introduction
Globally, there is increasing pressure on the agricultural sector to adopt good soil management practices to improve sustainability. These include the maintenance of soil structure by enhancing soil organic matter (SOM) content, avoidance of soil compaction and overworking, promoting soil biodiversity, minimizing water runoff and improving nutrient use efficiency (Bronick & Lal, 2005;Goulding et al., 2008). Most approaches for monitoring changes in SOM status (AHDB, 2018) and improving on-farm nutrient use efficiency through nutrient management planning (AHDB, 2020) rely on conventional soil testing. Typically, this involves the analysis of bulked samples from multiple sites across a field to measure SOM and a range of soil quality metrics. The quality metrics measured are referred to as standard soil pH and index tests (e.g., P, K & Mg indices), which are globally recognised and utilised in various guidelines in the UK (AHDB, 2020; Farm Advisory Service, 2019), Ireland (Wall & Plunkett, 2021), New Zealand (Fertiliser Association of New Zealand, 2018) and several states in the US (e.g. University of Wisconsin-Extension, 2013). Test results are then used to determine the necessary lime application rates, inorganic fertiliser (P, K, Mg and S) and/or organic inputs (animal manure and/or slurry) for optimal plant/crop growth to minimise losses to the environment (e.g., N loss to leaching).
Whilst conventional soil testing remains the most accurate method for measuring these soil variables, it can be laborious, and can only provide limited information on spatial and temporal variability. In terms of management, this can result in areas within fields being over-fertilised, with a risk for nutrient losses to the environment, or areas of fields being under-fertilised, resulting in sub-optimal yields (Bönecke et al., 2021;Maleki et al., 2008). Improving nutrient use efficiency requires full capture of the intrinsic spatial and temporal variability in soil properties within a field, which often requires zonation of fields and a large number of samples to be taken (ca. 50-100 ha −1 ; Kerry & Oliver, 2007); which is neither time-or cost-effective. To address these issues, precision nutrient management tools (e.g., proximal soil sensors) have been developed to map soil properties at high spatial resolutions. Proximal soil sensors, defined as field-based sensors in close proximity to the ground (Rossel et al., 2011;Vereecken et al., 2016), use a wide range of measurement methods, including electrical and electromagnetic, optical and radiometric, mechanical, acoustic, pneumatic and electrochemical technologies (Adamchuk et al., 2004). These data are then processed with algorithms based on prior calibrations to compute estimates of a wide range of soil properties, including SOM (Shi et al., 2015), pH (Wang et al., 2015;Wenjun et al., 2014), heavy metals (Kalnicky & Singhvi, 2001), soil texture (Rossel et al., 2009) and nitrate (Jones et al., 2018). Results can then be used to determine spatially variable organic/inorganic fertiliser (Kassim et al., 2021) and lime application rates (Bönecke et al., 2021), which can improve agricultural nutrient use efficiency and reduce the risk of nutrient losses to the environment. However, this is only possible if the data collected are accurate, sufficiently precise and provide estimates of the target variable (e.g. soil pH) that are precise enough to underpin spatially variable management application rates (e.g., lime application to correct soil pH).
Methods used to provide soil mapping services that have been commercialised and introduced to agricultural practices through agribusinesses include visible and near-infrared spectroscopy (Vis-NIR) and gamma-ray spectroscopy (GRS). Vis-NIR methods use light sensors in the visible near infrared to quantify the reflectance from the soil to correlate with various soil geochemical properties (Stenberg et al., 2010) (e.g. SOM), whilst GRS correlates spectral features recorded as naturally emitted gamma radiation from the soil (Reinhardt & Herrmann, 1 3 2019). Unfortunately, their universal use on different soil types and/or agricultural land uses can be limited because of factors that interfere with these proximal measurements, which are also poorly understood. This is because the relationships between the directly measured variables (e.g. gamma ray emissions) and soil properties of interest are indirect, and depend on associate variables in ways which may differ between sites, soil types and land uses. For example, GRS and Vis-NIR measurements can be strongly influenced by soil surface characteristics, mineralogy and chemical properties (Dierke & Werban, 2013;Reinhardt & Herrmann, 2019). The situations in which the algorithms are calibrated determines in what specific settings it can be accurately applied (e.g. soil type) (Visser et al., 2021). Due to the limited evaluation of their universal application in the literature, little is known as to whether they are appropriate for soil mapping on less studied soil types and/or land uses such as grasslands.
These commercial products typically involve the use of electromagnetic induction (EMI), visible and near-infrared spectroscopy (Vis-NIR) or gamma-ray spectroscopy (GRS) for mapping soil properties alongside global navigation satellite systems (GNSS) positioning for mapping landscape features (e.g. slopes). Such soil mapping systems are widely promoted to grassland farmers, despite a lack of validation of their accuracy compared to other farming systems, such as arable and vegetable production (Higgins et al., 2019). This results in a level of uncertainty, particularly as there might be greater spatial and temporal variability in soil nutrients in grasslands compared to arable systems due to differences in climate (e.g., grasslands are typically in areas of higher rainfall), landscape (e.g. steeper slopes), excreta returned to the soil from grazing animals (van der Weerden et al., 2020) and selective grazing behaviours (Parsons et al., 1991). Further concerns include the lack of commercial regulations to ensure reliable and accurate data are provided by soil mapping agribusinesses; particularly as commercial soil testing laboratories, that offer the same services without the spatial attribution, require mandatory ISO accreditation (ISO/IEC 2017). There is also increasing evidence to suggest that combining different proximal soil technologies (e.g., GRS, electrical conductivity and Vis-NIR) significantly improves prediction accuracy, with the potential for more universal application (Ji et al., 2019;Vasques et al., 2020), yet many agribusinesses do not take this multi-sensor approach.
To the authors' knowledge, there are no studies to date that have evaluated whether commercially operated soil mapping services provide accurate soil data and variable input rates derived from the data to end users (e.g. farmers). In light of this, by comparing soil pH, P, K, Mg and SOM measurements derived from conventional soil sampling and laboratory analyses to measurements derived from commercial operators, across a range of soils that are typically found in UK grasslands, the following objectives were addressed: (1) Evaluate the accuracy of commercially-available GRS technology for soil index testing at high spatial resolutions on grasslands, (2) Validate the appropriateness for these commercially-available GRS technologies to calculate variable rates for inputs of lime, inorganic fertiliser (P and K) and/or organic resources (animal manure and/or slurry), and.
(3) Evaluate the accuracy for commercially-available visible and near-infrared spectroscopy and GRS technologies to estimate SOM for soil quality monitoring.

Study site and sampling fields
The study was carried out across three contrasting grassland fields on a commercial farm located at Bangor University's Henfaes Research Centre (53°14′13.2″N, 4°00′58.3″W) in Abergwyngregyn, Wales, UK (Fig. 1). The three grassland fields were chosen to represent a range of management intensities (e.g., low and medium) and grassland locations (e.g., upland and lowland) to reflect much of the grasslands used for livestock farming across the UK. Differences in management inputs and grazing intensities for each are described in Table 1.

Low management intensity lowland pasture/field (LL)
A lowland field with Eutric Cambisol soils, a mean elevation of 3.8 m (ranging 2.89-7.0 m) and mean slope of 3.4% (0.2-10.1%). The field is an old permanent pasture (> 25 y), subject to seasonal saline ingression and surface flooding during winter months. It has been managed since January 2014 under an agri-environment scheme (Welsh Government, 2019) which prohibits application of inorganic fertilisers, slurry and lime (Table 1). Some grassland management operations have taken place such as topping / cutting rushes. Grazing intensity (Table 1) has been regulated by the requirement to maintain specified sward height ranges during defined periods of the year, whereby > 20% of the sward must be less than 70 mm in height and > 20% of the sward must be greater than 70 mm during the period 31st July to 15th March (Welsh Government, 2019).
Since 2004, a more extensive management has been implemented (Table 1).

Low management intensity upland pasture (LUp)
This land parcel forms part of a unit of enclosed upland grassland with Haplic Podzol soils, a mean elevation of 334 m (322-348 m) and mean slope of 27.6% (7.6-35.3%). Like many areas of upland Wales, the field was agriculturally improved in the 1950s, though routine applications of fertiliser and lime were halted in 1985. After 15 years of no inputs, the parcel was included in an agri-environment scheme option (Welsh Government, 2019) that focused on conversion of semi-improved grassland to unimproved. This involved taking a crop of hay for the initial 3 years of the agreement (2000)(2001)(2002)(2003) with no application of organic/inorganic fertilisers or lime and very low stocking rates (Table 1).

Gamma-ray soil scanning survey
All study fields were scanned with a commercially-available Gamma-ray sensor that was installed 0.5 m above the ground on a four wheel drive vehicle. The scanner measures the decay of natural radioisotopes of uranium ( 238 U), caesium ( 137 Cs), thorium ( 232 Th) and potassium ( 40 K). Measurements were recorded at a rate of 800 data points ha −1 by controlling the travel speed, see Fig. S1 for set up. A minimum of four physical soil samples (0-100 mm depth) per 10 ha were collected, analysed and utilised for sitespecific calibration. Sample locations were determined through neighbouring search analysis to ensure samples represented maximum spatial variability. Raw gamma-ray data processing for soil pH, P, K, Mg and SOM estimations were made by the agri-tech business; information on this process was not provided due to commercial sensitivity.

Visible and near-infrared spectroscopy soil scanning survey
Study fields, except for the LUp, were scanned with an optical module with 660 nm red and 940 nm near-infrared wavelengths to estimate SOM. The module was mounted between two discs that form a V-shaped slot in the soil which allow for data to be collected approximately 40 mm below the soil surface on 15-20 m transects. Approximately 180-240 optical data points ha −1 were collected across two of the fields by controlling the travel speed. For further detail on methodology, please refer to Kweon et al. (2013). LUp was not scanned due to the rocky terrain which would have damaged the module. Data were processed by the agri-tech business; information on this process was not provided due to commercial sensitivity.

Soil sampling
To validate the results of scanning against manually collected soil samples, a total of 225 soil samples were taken across the three study fields (Fig. 1). Ninety percent of the sample locations were allocated according to a spatial coverage design which aims to minimise the distance between a random location in the field and the nearest sample point (Walvoort et al., 2010). This was a suitable sampling design to ensure good coverage of the study area, and to support spatial mapping of measured soil properties by the best-linear unbiased predictor from a spatial linear model. The coverage design was implemented by the k-means method as encoded in the spcosa package for the R platform (R Core Team, 2019; Walvoort et al., 2010), with the total sample effort divided between the fields, to give a uniform sampling density. The distribution of sample points is shown in Fig. 1 (mean distance between each sample and its nearest neighbour = 21.8 m, min = 21.7 m and max = 22.1 m). The remaining 10% of sampling locations were randomly allocated 1 m away from a random subset of the 90% of samples allocated (Fig. 1). These are referred to as 'close pairs' and help support estimations of spatial covariance parameters (Lark & Marchant, 2018). At each location, soil samples were collected to a depth of 75 mm, in line with national agronomic soil sampling guidelines for grasslands as outlined in the UK fertiliser manual, RB209 (AHDB, 2020).

Soil analysis
Soil samples were homogenised by hand prior to analysis, removing vegetation and stones. Soil moisture was calculated as the percentage mass loss after oven drying (105 °C, 24 h). SOM was calculated through loss-on-ignition at 550 °C for 3 h (Hoogsteen et al., 2015). A subsample of each sample was sent to NRM Laboratories Ltd., Bracknell, UK (division of Cawood Scientific Ltd.), the largest commercial laboratory for agronomic soil analysis in the UK. The laboratories hold ISO/IEC accreditation (ISO/IEC 2017) from UKAS (Staines-upon-Thames, UK). Samples were sent for standardised soil agronomic analysis (AHDB, 2020) as recommended by UK nutrient management guidelines (AHDB, 2020). Measurements included soil pH and soil indices P (Olsen P method, Hislop & Cooke, 1968), Mg and K (ammonium nitrate soil extract, BS:3882:2007, British Standards Institute, London, 2007).

Statistical analyses
For each validation sampling point, the nearest neighbouring point from the GRS and Visible and near-infrared spectroscopy soil scanning points were determined using the nn2 function in the RANN package for the R platform (Arya et al., 2019). Exploratory statistics were computed for the measured soil properties and for the extracted predictions at the nearest neighbouring point on the prediction grid. In no case did this analysis show outlying observations, and in no case were the data skewed to the extent that a transformation would normally be required (Webster & Lark, 2019). A plot of the predicted values against the measured values was also produced for each variable. A classified post plot was produced for the predicted and measured values. This shows their spatial distribution with the quartiles of the data displayed in different colours to give an initial impression of the spatial variation. Because the sample sites for soil measurement were not selected independently and at random, the standard design-based methods for their analysis could not be used (Lark & Cullis, 2004). For this reason, linear mixed models were used, firstly to estimate parameters of a "null" model for the soil measurements (in which the only fixed effect is a constant mean), and secondly, to estimate parameters of a model in which the measured soil properties are treated as a linear function of the nearest-neighbouring predicted value. In both cases, the random effects comprised an independent and identically distributed between-field Gaussian random variable, a spatially-correlated within-field Gaussian random variable of mean zero, and an exponential covariance function, and an independent and identically distributed within-field residual component. The parameters of the random effects were estimated in each case by residual maximum likelihood, and the fixed effects parameters (constant mean in the null model, and the intercept and regression coefficient for the second model) were then estimated by generalized least squares, along with standard errors, as described by Lark and Cullis (2004). The maximum likelihood estimates of the variance parameters were obtained by minimization of the negative residual log likelihood, using the simplex algorithm of Nelder and Mead (1965) as encoded in the optim function in base R (R Core Team, 2019). Once the second model had been fitted, the null hypothesis that the true value of the fixed effect coefficient for the sensor-derived variable was zero (i.e., no relationship) was tested by computing the log-likelihood ratio statistic L. Because residual likelihoods are not comparable between models with different fixed effects structures, this was done following the method of Welham and Thompson (1997) as presented by Marchant et al. (2009) under which the null model is refitted with the projection matrix of the second model. Under the null hypothesis, L is asymptotically distributed as chi-squared with degrees of freedom equal to the number of additional fixed effects in the second model (1 here).

Soil management calculations
Lime application rates were calculated using the recommended nutrient management guidelines (AHDB, 2020). Calculations were based on soil pH and were made for both soil pH determined through validation samples and estimated pH measurements from GRS scanning; calculations were adjusted with the aim of reaching an optimum pH of 6.5 through lime application. Phosphate, potassium and magnesium application rates are typically determined by their soil P, K and Mg indices, respectively. For grassland systems, index 0 represents very low fertility, index 1 is low, index 2 is adequate and index 3 and above indicates unnecessarily high fertility (AHDB, 2020). P, K and Mg indices were allocated for both validation samples and estimated predictions using ranges detailed in Table 2 (See Fig. 2).

Results
There was a weak evidence for a relationship between the predicted and measured values of pH, P, K and Mg in grassland systems as shown by the scatter-plots, spatially classified post-plots and/or differences between management recommendations based on GRS predictions compared to validation measures (Table 3 and Figs. 3 & 4). However, both GRS and Vis-NIR predictions of SOM were significantly related to the validation measurements, although the predictions did not appear to account for within-field scale variation (Table 8 and Figs. 4 & 5).

Comparison of soil pH measured either in situ or ex situ
A relationship between the GRS predicted and laboratory measured soil pH across the three fields was found (Fig. 3); however, there was much more variation in the validation measurements than the GRS-predicted measurements (Fig. 3). There was some evidence to suggest that GRS is appropriate for predicting soil pH as there was a positive fixed effect coefficient for the sensor-derived pH prediction in the second model for the observed pH values (Table 3), and the coefficient is more than twice its standard error, with the P-value for the log-likelihood statistic (0.02). However, the correlated within-field variance was only reduced by 20% when the GRS derived values were included as a predictor (Table 3), resulting in some uncertainty associated with GRS for soil pH predictions. The classified post-plots show that both GRS soil pH predictions and validation measures agree that the LUp field is dominated by acidic soils (points in the bottom quartile, Fig. 2). However, in the other two fields, LL and ML, the soil pH estimated by GRS showed that the more neutral samples (the top quartile, Fig. 2) form a continuous block which was not observed

Fig. 2
Quantile classified post-maps of soil pH and P for soil samples analysed ex situ in a centralised testing laboratory (validation measures) versus quantiles from estimated samples obtained from in situ gamma-ray spectroscopy (GRS) across three fields (n = 95 for LL, n = 51 for ML and n = 79 for LUp, see Fig. 1) by the validation measurements (Fig. 2). In terms of management, the data shows that only 20% of the liming application rates calculated from soil pH predicted by GRS were the same as application rates calculated through physical soil samples (validation measures),    24% of GRS samples overestimated liming application rates and 56% underestimated ( Table 4).

Comparison of soil P measured either in situ or ex situ
As with soil pH, the classified post-plots for GRS predicted and validation measured soil P show that the predicted values in the top two quartiles occurred exclusively in LL and ML (Fig. 5). The measured soil P-values in the top two quartiles occurred predominantly, but not exclusively, in these fields. There is a positive fixed effect coefficient for the sensor-derived P prediction in the second model for the observed values, but the coefficient is less than twice its standard error, and the P-value for the log-likelihood statistic (0.15), indicating that GRS is not suitable for predicting soil P (Table 3) nor useful for evaluating within-field variation. For grasslands, the target soil P indices for optimal grass growth is 2. Soil P indices are used to determine phosphate application rates; here only 27% of soil P indices calculated from soil P predicted by GRS were the same as soil indices determined from physical soil samples (validation measures) resulting in the same recommendations for P applications (Table 5). However, 48% of the GRS samples overestimated soil P indices, resulting in a potential under-application of phosphorus (only applicable when GRS estimations are higher that soil P indices equal to or below 2) and 25% underestimated resulting in an over-application of P (Table 5).

Comparison of soil K measured either in situ or ex situ
There was a notable spatial pattern in the predicted values, with all observations in the bottom quartile occurring in LUp and all the values in the top quartile, and all but one in the third, occurring in LL and ML (Fig. 2). The spatial pattern of the observed values is quite different, with all but two of the observations in the bottom quartile occurring in LL and ML (Fig S1). There is a positive fixed effect coefficient for the sensor-derived K prediction in the second model for the observed values, the coefficient is just in excess of twice its standard error, but the P-value for the log-likelihood statistic (P = 0.06) indicates that there is very weak evidence to suggest GRS is suitable for predicting soil K concentrations ( Table 3). As with soil P and pH, these results do not suggest that the sensor-derived predictions of soil K are useful for evaluating within-field variation. Target soil K indices for optimal grass growth is 2-; the data shows that only 23% of soil K indices calculated from soil K predicted by GRS scanning were the same as soil indices determined from physical soil samples (validation measures) resulting in the same recommendations for potash applications (Table 6). However, 48% of GRS predictions overestimated soil K indices, resulting in a potential under-application of K (only applicable when GRS soil scanning estimations are higher that soil K indices equal to or below 2 + and 29% underestimated resulting in an over-application of K (Table 6).

Comparison of soil Mg measured either in situ or ex situ
Neither the post-plots nor the scatter plots suggest a strong relationship between the predicted and measured soil Mg concentrations (Figs. 2 and 3). The fixed effect coefficient for the predicted Mg in the model for the measured values is positive, but only slightly larger than its standard error, and the P-value for the null hypothesis that the coefficient is zero (P = 0.26) shows that GRS is not suitable for predicting soil Mg (Table 3). As before, these results do not suggest that the GRS-derived predictions of soil Mg are useful for evaluating within-field variation, as reflected through the discrepancies seen in the quartile plots (Fig. S1) and soil Mg indices which are used to determine Mg application rates. The data shows that 43% of soil Mg indices calculated from soil Mg estimated from in situ GRS were the same as soil indices from soil samples analysed ex situ in a centralised testing laboratory (Table 7). However, 49% of samples overestimated soil Mg indices, and 8% underestimated.

Comparison of soil organic matter measured either in situ or ex situ
This study provides strong evidence to show that both GRS and Vis-NIR technology can estimate SOM accurately as the scatterplot shows evidence of strong relationships between predicted and measured SOM for both GRS (Fig. 5) and Vis-NIR (Fig. 5). This is also confirmed by the positive fixed effects coefficient, four times its standard error for GRS and three times for Vis-NIR (Table 8). Note, however, that while the between-field variance components are markedly smaller for the regression models than the null models, the other variance components are little-changed, as are the distance parameters of the within-field variance components for both GRS and Vis-NIR. This indicates that the predictive capacity of the GRS and Vis-NIR effectively capture the differences between the fields, but do not provide significant information about the within-field variation as the magnitude and spatial structure of the within-field variation of the errors of the regression model is very similar to that in the null model. Comparisons between GRS and Vis-NIR should not be made as the Vis-NIR dataset is smaller due to issues associated with the terrain (see Sect. 2.3). The GRS post-plots show that, for both the SOM estimated through GRS and measured SOM levels, the more organic soils are found in LUp, where LL soils are more organic than ML (Fig. 3). The Vis-Nir post plots show that the ML field has higher SOM than the LL field. Table 8 Summary validation statistics for linear mixed models estimating parameters of the "null" model and model with estimations based on SOM estimations obtained from in situ gamma-ray spectroscopy (GRS) and in situ visible and near-infrared spectroscopy (Vis-NIR) Samples sizes are different for GRS and Vis-NIR data sets due to terrain issues for the Vis-NIR module, resulting in the Vis-NIR scanning two fields and the GRS three (gamma-ray n = 225 and Vis-Nir n = 146, see Sect.

In situ versus ex situ soil testing for nutrient status
This study determined that the commercially-available GRS soil mapping services that were examined are not currently accurate enough to replace conventional soil index testing on grassland soils. They were not able to accurately measure between-field and within-field variations for the quantification of soil pH, P, K and Mg across three contrasting pastures that were representative of most of the UK agricultural grasslands. Although these findings are not in line with other studies that demonstrate the potential for GRS technologies to measure soil pH, P, K and Mg (Reinhardt & Herrmann, 2019;Wong & Harper, 1999), this disparity may be due to the focus on grassland soils and evaluating the technology within a commercial context. These findings suggest that commercial soil mapping services widely used in an arable setting require further development and specific calibration for grassland soils to avoid providing poor-quality information. Inaccurate mapping services can have major implications, including inappropriate soil management decisions (See 4.3) and poor precision agriculture technology adoption due to negative farmer perceptions (See 4.4). Because this study focussed on soil mapping services from agribusinesses as an end user (e.g., grassland farmer), the mechanisms that contributed to the observed inaccuracies could not be explored. These findings suggest that further research on the calibration of these sensor technologies for grassland soils is needed, explicitly comparing and contrasting the proxy relationships between the measured and target variables in arable and pasture soils. It is clear that commercial organizations offering soil-mapping services based on proximal sensing technologies must pay careful attention to whether the conditions in which the services are applied are close enough to the development and calibration environments to ensure that predictions are of adequate quality. Implementing truly independent validation of technologies in independent locations, as in this reported study here, are necessary if over-confidence in technologies is to be avoided by assuming that the translation from the development to the implementation environment can be done without loss of accuracy or precision. The implementation environment may be more heterogeneous and complex than the development and calibration environment (Sumberg, 2012), and care is needed to ensure that the calibrations translate across what Visser et al. (2021) call the implementation gap. Another factor which these results on SOM highlight is the scaling effect. When developing a technology to predict soil properties, it is important to examine the scaledependence of the relationship between the target property and the proxy measurement. These results showed that sensor-predictions of SOM captured the large-scale (betweenfield) variations. However, within-field variation (which the technology is promoted for) was not captured. Validation of a technology for use at within-field scale must be done at that scale.

In situ versus ex situ soil testing for soil organic matter
This study showed that the commercially-available GRS and Vis-NIR technologies evaluated in this study can effectively and sufficiently monitor between-field variations of SOM in grassland systems, although the calibrations used to process the data were biased. This is in line with the wider literature that has shown GRS and Vis-NIR spectra to successfully predict SOM (Hummel et al., 2001;Rossel et al., 2006;Wenjun et al., 2014). Gamma-ray signals are not sensitive to SOM changes due to the relatively high signal/noise ratio (Reinhardt & Herrmann, 2019), suggesting that the existing algorithms used are sophisticated enough to predict SOM measurements on grassland soils. However, it must highlight that this was only achieved at between-field scales as both GRS and Vis-NIR did not effectively predict the within-field variation. Nonetheless, these findings show that commercial tools pose promising opportunities to meet the increasing demand for SOM monitoring tools, particularly for when new policies and/or agri-environment schemes that reward farmers for their efforts to conserve and improve SOM are implemented (Lal et al., 2015;Montanarella & Panagos, 2021).

Implications for soil management
These findings demonstrate that the GRS soil mapping services examined are not currently accurate enough to replace conventional soil index testing for calculating variable lime and organic/inorganic fertilisers application rates on grassland soils (Tables 4, 5, 6, 7). The inaccurate estimations for soil pH, P, K and Mg can have major consequences on farm productivity and profit, as well as implications for nutrient losses to the environment. In instances where the sensors underestimate variable lime and/or fertiliser application rates, reduced grass yields may result in the need to import additional feedstock and its associated costs. In cases where the sensors overestimate fertiliser application rates, there are significantly increased risks for nutrient losses to the environment, which contradicts the primary purpose for adopting these tools.
These findings also demonstrate how field maps are useful tools to highlight fields with lower SOM content, allowing farmers to link SOM with grass yields and the opportunity to identify fields that have been vulnerable to losses through erosion, compaction and/or overgrazing. For example, in this study, it was not possible to determine that SOM content was lower in the ML field compared to the LL field, a likely result of the LL having variable drainage and experiencing seasonal flooding in comparison to ML. These tools therefore go beyond just monitoring trends in SOM and they could be used to monitor the success of soil management practices that aim to improve carbon sequestration, SOM content and deliver national and global soil initiatives, such as the 4 per 1,000 initiative (Minasny et al., 2017).

Implications on soil mapping technology adoption by grassland farmers
Commercially-available soil mapping technologies that are not fully reliable present a large amount of risk to farmers for future adoption. This could have economic impacts on farm businesses, and erode relationships between farmers and agribusinesses (Eastwood & Renwick, 2020). Uptake of soil testing is already low amongst grassland farmers (Rhymes et al., 2021) therefore encouraging the adoption of new soil mapping technologies that are not ready for use on grassland farms can exacerbate the issue of low soil testing uptake and jeopardise the future uptake of precision agriculture technologies.
To foster and encourage the uptake of soil mapping and precision farming technologies, the data provided from these services must provide additional value to the conventional soil testing methods used. Evidence for poor within-field SOM predictions is provided meaning the soil scanning data are in essence of no additional value to the data obtained from conventional soil testing. In turn, there are concerns for its economic viability as conventional soil testing is substantially cheaper to implement (~ £10 per field) than soil mapping services (~ £20 per ha).

Commercial recommendations
The lack of methodological transparency from commercial soil mapping agribusinesses raise difficulties for end users to critically assess the reliability and suitability of these services for their intended purposes (Padarian et al., 2020). To address this issue, it is highly recommend that soil mapping services are made certifiable, similar to the existing ISO accreditation scheme (ISO/IEC 2017) for laboratories that provide soil analysis services. However, this will be challenging as the existing ISO accreditation is not suitable for in situ methods and is likely to come at an additional cost. A potential solution to this could include a network of trial sites that represent a wide range of soil conditions and land uses where the technologies can be evaluated for quality assurance purposes. Agribusinesses can then be issued a certification based on successful predictions that evaluate within and between-field accuracies. It is highly recommend that agribusinesses provide independent accuracy estimates for each report they deliver (e.g., per field scanned), which will allow end-users to critically assess the maps provided to them.
Extensive site-specific instrument calibration is integral in generating more refined and usable data (Rehman et al., 2019). This is where conventional soil 'validation' samples are taken across the digitally mapped field, and analysed in the laboratory for site-specific calibration of the algorithms used. Therefore enforcing a minimum number of validation samples to be taken per farm/field/soil type is also recommended. Typically these recommendations are quite high (ca. 40 for predicting Mg; Li et al., 2019) in comparison to samples that are actually taken within a commercial context (in the region of 4 per ha); further research would be required to determine a cost-effective threshold.
Lastly, there is emerging evidence to suggest that combining different proximal soil technologies (e.g., GRS, electrical conductivity and Vis-NIR) has the potential to significantly improve prediction accuracy (Ji et al., 2019;Vasques et al., 2020). It is recommended that these data fusion techniques are therefore considered by commercial contractors and appropriately validated for different systems, including grasslands.

Conclusions
This study showed that there are some companies operating commercially that are providing inaccurate soil mapping services to predict soil pH, P, K and Mg on temperate grasslands that reflect most of the UK agricultural grasslands. Subsequently, these methods were not appropriate for calculating variable lime and organic/inorganic fertilisers' application rates, which could lead to negative environmental and/or economic implications. These findings suggest that there is a poor understanding around the limitations of these technologies, and that further research is required to evaluate their universal application. It is highly recommended that existing and future commercialised soil mapping technologies are certified for quality assurance and are required to obtain a minimum number of validation samples to ensure site-specific data accuracy. This is particularly important as negative farmer experiences could jeopardise the future adoption of technology that facilitates sustainable agriculture.
Both the GRS and Vis-NIR soil scanning technologies studied here were able to predict SOM with good accuracy at between-field scales on these grassland fields. However, this raises concerns for its economic viability as the technologies were unable to appropriately identify the within-field variation. It is therefore essential that these technologies are critically evaluated for both within-field and between-field predictions, which is not often accounted for by the scientific community. Nonetheless, with farmers likely to be rewarded for SOM maintenance and improvements under future schemes, these soil scanning tools represent an important opportunity to support and validate such programmes. Moreover, farmers monitoring their soils fosters better soil-farmer relations and consequently the adoption of more sustainable soil management practices.