Introduction

There are no detailed general maps adapted for use at farm level on topsoil texture in arable land in Sweden (which comprises about 2.6 million hectares in total). Instead, each farmer must perform soil sampling and pay for soil analyses. This is considered expensive and the number of samples obtained is often low. In addition, texture is seldom analysed. The Geological Survey of Sweden (SGU, Uppsala, Sweden) carries out Quaternary soil mapping of the country, but these maps reflect the conditions at 0.5 m depth and the soil classification used in these maps, which is genetically based classes on the origin of the deposits of the loose material such as “postglacial clay”, differs from that used by farmers. Rather, recommendations on soil management used in agriculture require detailed knowledge on percentages of for example clay content. The Swedish Board of Agriculture (Jordbruksverket, Jönköping, Sweden) has recently completed two general soil sampling programmes to determine clay, sand and soil organic matter (SOM) content in arable topsoil. The first was at farm level in conjunction with the granting of environmental subsidies, which resulted in 22 500 soil analyses on one thousand farms in southern Sweden. In this case, analyses were performed for one sample per 3 ha on the participating farms. In the other sampling programme, soil samples were obtained regularly, approximately every 1000 m of agricultural land. In a joint project run by the Swedish University of Agricultural Sciences (SLU) and SGU, of which this study forms part, the intention was to combine the data from these soil sampling programmes with information in other publicly available datasets in order to make predictions of soil textural properties at a spatial resolution that enables discrimination of within-field variability and thereby use in precision agriculture. Digital soil mapping (DSM) is currently being carried out in many countries and at different scales, including globally (e.g. Lagacherie and McBratney 2007; Arrouays et al. 2014; Hengl et al. 2014, 2015). DSM is defined as: ‘the creation and population of spatial soil information systems by the use of field and laboratory observational methods coupled with spatial and non-spatial soil inference systems’ (Lagacherie et al. 2006). Accordingly, the concept comprises three main components: the input data, the mathematical/statistical inference system and the spatial soil information systems from which maps and uncertainty measures can be outputted (Minasny and McBratney 2016). The process of DSM involves determining soil variation in relation to the landscape, finding measurable proxy variables for the soil property of interest and developing quantitative (spatial or non-spatial) models for prediction of the target property. Jenny (1941) identified five major soil-forming factors and formulated a mechanistic model for soil development. These so-called ‘clorpt’ factors were: climate (cl), organisms (o), relief (r) parent material (p) and time (t). McBratney et al. (2003) proposed a generic framework for digital soil mapping based on Jenny’s approach, but also taking spatial dependence into account. This general approach has been widely accepted in the area of digital soil mapping and pedometrics. However, an appropriate scale, data layers, sampling strategy, data collection method, quantitative prediction model and validation strategy must be identified for each environment and application. In addition, Grunwald et al. (2011) claimed that the anthropogenic impact on soil properties needs to be considered. Today, remote sensing (Mulder et al. 2011) and proximal sensing (Viscarra Rossel et al. 2011) methods offer great opportunities for cost-efficient collection of measurement data with extensive spatial coverage. Data from multiple sensors can be combined (sensor data fusion) for improved prediction of soil properties. For example, Piikki et al. (2013) found that predictions of topsoil clay content were improved when data from an electromagnetic induction (EMI) proximal sensor were integrated with aerial photography images. In the present situation, many data layers are often available and over-fitted prediction models can become an issue. As is evident e.g. in the examples presented by McBratney et al. (2003), no general recommendations can be made on which predictors are the most powerful in a specific case (target variable, area, scale, prediction method).

The overall availability and spatial coverage of predictor data are obviously important in any project aiming at mapping on a national scale. Most of the land area in Sweden has already been mapped using airborne gamma-ray spectrometry, a remote sensing technique initially developed for uranium exploration that has been used for geological and geochemical mapping and for mineral exploration surveys since the 1960s (e.g. Wilford et al. 1997; IAEA 2003; Wilford and Minty 2007). Almost all gamma radiation detected by airborne gamma-ray sensors emanates from the radioactive decay of potassium (40K), uranium (238U) and thorium (232Th). Natural emissions of gamma rays from rock outcrops and soil reflect the type of bedrock, bedrock weathering and the processes of soil development (e.g. IAEA 2003; Wilford and Minty 2007; Wilford 2012). In digital soil mapping, the relationship between a number of soil properties and gamma-ray data has been successfully used for predicting e.g. weathering, soil parent material, soil texture and soil cadmium concentration (Cook et al. 1996; Wilford et al. 1997; Wong and Harper 1999; Pracilio et al. 2006; Rawlins et al. 2007; Van der Klooster et al. 2011; Söderström and Eriksson 2013). In addition, vehicle-mounted equipment for gamma-ray measurements has recently been developed and can be used for meticulous gamma-ray scanning (Van Egmond et al. 2010). The radius of the footprint of such an instrument is roughly four times the distance between the sensor and the ground (IAEA 2003), making very detailed mapping possible.

Digital elevation data are also commonly used in DSM. Reviews by e.g. McBratney et al. (2003) and Omran (2012) list numerous examples of studies examining the relationship between various soil properties and elevation, as well as derivatives such as slope, curvature and different topographical indices. In Sweden, a laser-scanned national digital elevation model with 2 × 2 m2 spatial resolution is available for most of the country and is therefore an interesting dataset to use for DSM purposes.

The primary aim of the present study was to develop and assess methods for regional digital soil mapping of topsoil texture (content of clay and sand) and soil organic matter (SOM) with respect to their potential for use at farm level in precision agriculture. An additional objective was to test whether some secondary soil variables estimated from texture and SOM that are directly used by farmers in their management decisions could be modelled in a similar manner. These secondary variables were target pH and buffering capacity, which together with soil pH are used for determination of liming requirement for pH correction in Sweden.

Materials and methods

Geology and geography of the study area

The study represents the initial phase in a country-wide effort to map all arable topsoil in Sweden and was implemented in a test area in the south-west of the country (Fig. 1). All datasets used are available at national level and the results are intended to guide future efforts to create a national digital map of arable soil. The study area is located in southwest Sweden around the town of Skara, about 90 km east from the Skagerrak coast. The area is bordered by Lake Vänern (44 m above sea level (asl)) in the northwest and covers in total 201 000 ha, of which 102 500 ha are arable land. SGU’s map of Quaternary deposits (QD) is the only national soil map of Swedish arable land that is available at a relatively detailed scale (1:50 000). A generalisation of that map is used in Fig. 1 to display the dominant type of deposit of the subsoil conditions (at 0.5 m depth). The topsoil is not mapped by SGU and no details are given of clay, sand or organic matter percentages; all deposits with a clay content above 15 % are shown as clay on the maps. As can be seen in Fig. 1, the arable land in the west of the study area is dominated by postglacial clay, covering in total about 46 000 ha of arable land. This soil material originates from a period after the melting of the Weichselian ice sheet about 12 000 years ago, when the region was covered by sea. This is the most low-lying part of the area and also the most intensely cultivated, with relatively large fields of mostly grain crops and oilseed rape. In the southwest and north of the study area, the soil texture is dominated by partly wave-washed, coarse material of glaciofluvial or fluvial origin. In addition to the crops mentioned above, potatoes are grown widely on these soils. Most soils in the area can be classified as Eutric Cambisols (IUSS Working Group WRB 2014). Further east, and separated from the area in the west by a district dominated by forestry (white area in Fig. 1), the land is topographically elevated (up to about 330 m asl), with a range of table mountains stretching from NNE to SSW. These mesas have a Cambrian–Ordovician origin and consist of sedimentary rocks such as sandstone, shales and limestone, whereas in the rest of the area the loose material is underlain by igneous and metamorphic felsic granites and gneisses. Soils formed from glacial till dominate the eastern sector of the map area (about 20 000 ha), which was not covered by water after the deglaciation. In some locations the regolith is very shallow, even marked as bedrock at 0.5 m depth in Fig. 1, and most of the organic soils (approximately 4600 ha in total with SOM content >40 %) occur in these locations. In this eastern area, where livestock production is frequent and ley, barley and oats are common crops, the soils can be classified as Cambisols, Leptosols and Histosols (IUSS Working Group WRB 2014), the latter corresponding to the organic deposits on the QD map.

Fig. 1
figure 1

Distribution of the Grid and the Farms soil sample sets within the study area at Skara, southwest Sweden. A simplified version of SGU’s Quaternary deposit (QD) map displaying the dominant soil type at 0.5 m soil depth is shown in the background in the part of the area consisting of arable land. The arrow indicates Entorp Farm, which was used for exemplifying how the different digital soil mapping methods work locally

Soil samples datasets

The two reference soil datasets used for this digital soil mapping study, denoted Grid and Farms respectively, were obtained from the Swedish Board of Agriculture (Jönköping, Sweden). The Grid dataset consisted of about 12 500 soil analyses on samples obtained on a grid covering arable land in southern Sweden (Fig. 1). The soil samples were taken at a spacing of approximately 1000 m in areas of arable land. When it was not possible to collect a sample at the planned location, the sampling protocol, derived by Statistics Sweden (SCB, Örebro, Sweden), allowed the sampling site to be moved a maximum of 10 m towards the middle of the field, or otherwise the sample was not taken. The actual soil sample density therefore varied and also depended on the distribution of arable land. The Farms dataset comprised soil analyses clustered on farms spread all over southern Sweden, with one sample every 3 ha on the sampled farms. Eighty-two of these were located in the study area.

All soil samples in both datasets were taken from the topsoil (0–20 cm) within a subarea of 30–80 m2 (7–10 topsoil cores within a 3–5 m radius circle at each sampling point). The cores were pooled to one sample. In the laboratory, the samples were air dried at 35–40 °C, milled and sieved through a 2-mm mesh. The content of sand (0.2–2 mm) was determined by sieving and weighing and the content of clay (<2 µm) by the sedimentation method (Gee and Bauder 1986), where a soil suspension was prepared by adding sodium pyrophosphate and allowed to sediment under controlled temperature conditions. The clay content was then calculated from the density of the suspension at a specific time and depth (Method: ISO 11277). The sand and clay content were both expressed as a percentage of the fine soil fraction (<2 mm). The SOM content was determined from the clay content and the loss on ignition, which was determined by drying a sample at 105 ± 1 °C, weighing it, incinerating it at 500 °C (±10 °C) for 3 h and then weighing it again after cooling to about 50 °C (original method by Ekström (1927)). Since samples with SOM >20 % coincided well with the areas outlined as organic soil by SGU (see Fig. 1), the corresponding areas were excluded from the modelling. In the Farms dataset we excluded farms with less than nine soil samples. Thus, data from 56 different farms were kept (out of the 82 in the area in total). For the modelling, this left some 98 000 ha of arable land and the remaining number of soil observations in the Grid dataset was 446, while 1968 observations remained in the Farms dataset.

Applied pedotransfer functions

Texture and SOM data are often not directly applicable for management decisions in the field but can be used as input to pedotransfer functions (PTF; Bouma 1989), e.g. for prediction of important soil functional properties (Arrouays et al. 2014) or other measures used in practice, such as leaching risk, water availability, seed rate, nitrogen mineralisation or liming requirement. As examples of such properties used in practice by Swedish farmers, here we modelled two properties used for determining the amount of lime needed to adjust the soil pH to a desired level. These were: (1) target pH, which is the suggested suitable soil pH (Jordbruksverket 2013) in a soil with a known clay content and a known SOM content; and (2) the ability of the soil to resist pH changes (buffering capacity), which can also be described by a function of clay content and SOM content. According to official guidelines used in Sweden, both these properties can be estimated through simple equations, derived from tabulated values in Jordbruksverket (2013) and Gustafsson (1999):

$$ {\text{Target}}\;{\text{pH}} = 6.1 + 0.01 \times {\text{Clay}} [\% ] - 0.033 \times {\text{SOM}} [\% ] $$
(1)
$$ {\text{Buffering}}\;{\text{capacity}}\;({\text{t}}\;{\text{ha}}^{ - 1} ) = 1.9 + (3.5 \times {\text{SOM}} [\% ]) + {\text{Clay}} [\% ])/3.8 $$
(2)

The buffering capacity in this context is expressed as the amount of lime with 50 % CaO needed to raise the soil pH by one unit. From a practical point of view, if it would be possible to model the variation in these properties at farm level with sufficient accuracy, the farmer would only need to provide measurements of soil pH in order to determine the amount of lime required. Target pH and buffering capacity are generally estimated based on clay and SOM (Eqs. 1 and 2), but a new approach tested here was to model both these properties directly from the predictors, instead of e.g. calculating them from maps of modelled clay and SOM. Statistical data on clay, sand, SOM, buffering capacity and target pH in the Grid reference dataset (Table 1) and in the Farms dataset (Table 2) are summarised below.

Table 1 Statistical data on variables in the Grid reference soil dataset (n = 446) after removal of samples from organic soils not included in the modelling
Table 2 Statistical data on variables in the Farms reference soil dataset (n = 1968, on 56 farms) after removal of samples from organic soils not included in the modelling

Modelling

Multivariate adaptive regression splines (MARSplines) were used for the modelling (Hastie et al. 2009). This is a form of non-parametric regression in which the data are described by a number of piece-wise linear splines (H m ) that are split at break-points (t):

$${H_m} = {\left({x - t} \right)}_ + =\Bigg \{ \begin{array}{ll} x - t& x > t\\ 0 & {otherwise}\end{array}$$
(3)

The clay, sand and SOM content, buffering capacity and target pH (\( \hat{y}_{1..n} \)) were estimated as functions of the predictors (x); the intercept parameter β 0 was added to the weighted (by β m ) sum of M basis functions and their interactions (k):

$$ \hat{y}_{n} = f\left( x \right) = \beta_{0} + \mathop \sum \limits_{m = 1}^{M} \beta_{m} H_{km} (x_{{\left( {k,m} \right)}} ) $$
(4)

The number of interactions was set to 1 (i.e. no interactions were allowed) and a pruning function was used. In the pruning process functions that did not substantially contribute to the prediction model was removed, in order to reduce the risk of over-fitting (Hastie et al. 2009).

For mapping and data management, ArcGIS 10.2 with the extensions Spatial Analyst and Geostatistical Analyst (ESRI Inc., Redlands, CA, USA) was used. The open-source statistical software R (R Core Team 2014) was used for development of calibration models and validations.

Three types of predictor variables were used in the modelling

  1. (i)

    Soil information from legacy QD maps simplified into two versions: Six classes (denominated SGU6) which are displayed in Fig. 1; and also simplified into only three classes (SGU3: clay; organic and other).

  2. (ii)

    Digital elevation data: elevation (z) and derived relative topography on three different scales (RT5, RT50 and RT500).

  3. (iii)

    Airborne gamma-ray spectrometry: the isotopes 232Th, 40K and 238U.

The elevation data from the Swedish Land Survey (ii) were taken from its high resolution elevation database (2 × 2 m2 grid) covering most parts of Sweden. The spatial resolution of the elevation (z) was reduced to 10 × 10 m2 by mean filtering, to better reflect the support area of soil samples. Three measures of relative topography (RT) were estimated to reflect the landform in three different scales (A):

$$ RT_{A} = z(r) - \overline{z(r)}_{A} $$
(5)

where z(r) is the elevation at a specific lattice point r and \( \overline{z(r)}_{A} \) is the average elevation within a neighbourhood (A) around r. Estimates were made with A equal to 5, 50 and 500 hectares. Maps of elevation and relative topography are displayed in Fig. 2a–d.

Fig. 2
figure 2

Seven of the predictor variables used in the modelling: a Elevation (z); bd relative topography (RT5, RT50, RT500) of different neighbourhoods: 5, 50, and 500 ha; e 232Th; f 40K; and g 238U. In ad, higher values are shown in lighter grey. In eg only arable land is coloured and higher values are darker

The initial gamma-ray data from SGU (iii) were collected during a number of scanning campaigns, in 1977, 1990, 1991, 1993, 2002 and 2003. The data (about 460 000 recordings in total in the Skara project area) were collected along transects approximately 200 m apart, with about one recording every 16–17 m along the flight lines. The nominal flight height was 30 or 60 m, the latter being the current standard. The radiometric data were corrected by SGU for background radiation and variations in flight height. Only the conditions in the uppermost part of the soil are reflected in the gamma-ray readings; the top 30 cm of the soil yields about 90 % of the response (Taylor et al. 2002). A filtering procedure was applied to the data along the flight lines to reduce noise. Ordinary 2 × 2 block kriging (Isaaks and Srivastava 1989) was then used for interpolating the radioelements (Th, U and K) onto a 50 × 50 m2 point lattice, covering the arable land. Raster maps created from the point data are shown in Fig. 2e–g. Arable land was delimited by a detailed polygon map layer (known as “the Block map”) provided by the Swedish Board of Agriculture (Jönköping, Sweden) from the EU subsidies database.

Calibration methods, validations and mapping

Modelling strategies

For calibration, data from the predictor maps were derived for the locations of all soil samples in the Grid and the Farms datasets. Three different modelling strategies were tested for each of the five variables investigated (clay, sand, SOM, buffering capacity and target pH):

  1. (i)

    Grid Only the Grid dataset was used for calibration. This was a general DSM model valid for the entire map area which was applied on the Farms dataset for a regional validation. Both average validation statistics for the whole region was estimated as well as farmwise summaries, i.e. average performance on the farms. The latter shows how the regional DSM model will perform on individual farms.

  2. (ii)

    Farm Interactive: The Grid dataset was combined with soil analyses from one farm at a time in the Farms dataset. An extra weight (50-fold) was given to the added farm data. This simulated an interactive map with the possibility for farmers to upload and input available soil analyses from their own land into the calibration models. For validation of the Farm Interactive model one sample of the added farm data was removed at a time, and a new calibration model was developed and used for prediction of the removed soil samples.

  3. (iii)

    Current Simple interpolation (inverse distance weighting) of the soil analyses for each farm in the Farms dataset. This was used since it is current common practice at the farm level in those instances where texture and SOM are analysed at one sample per 3 ha and no regional digital soil map is available. In the Current model, simple leave-one-out cross-validation was used for validation for each of the 56 validation farms.

Validation statistics

The coefficient of determination (r2) of a linear regression line between predicted (\( \hat{y} \)) and measured values (y) and the root mean squared error (RMSE) were used for assessment and description of the validation results. These statistical measures were derived to depict how the different strategies functioned regionally (Eq. 6) and at farm level (Eq. 7):

$$ RMSE_{general} = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - \hat{y}_{i} } \right)^{2} } $$
(6)
$$ RMSE_{farms} = \frac{1}{56}\mathop \sum \limits_{1}^{56} \left[ {\sqrt {\frac{1}{p}\mathop \sum \limits_{i = 1}^{p} \left( {y_{i} - \hat{y}_{i} } \right)^{2} } } \right] $$
(7)

where p denotes the samples of each of the 56 farms.

Mapping

Since all predictors (x in Eq. 4) had been estimated for the 50 × 50 m2 point lattice, maps could easily be produced for all calibrated and validated models. In this process, the following boundary conditions were used to avoid unrealistic predictions: estimated minimum and maximum values were not allowed to be higher or lower than the values in the calibration set, and the sum of clay + sand content was not allowed to exceed 100 %.

To exemplify how these types of regional digital soil maps would perform at farm level if used for precision application of lime, variable-rate application maps were produced and compared for one farm (Entorp Farm) located in the transition zone between the more sandy central part of the study area and the clay-dominated area in the west (see Fig. 1). For comparison, an even more detailed map of this farm produced by proximal sensing was used as ‘ground truth‘. The proximal sensors and mapping procedure used on Entorp Farm are described in detail in Piikki et al. (2014).

Results

Maps of predictors derived from airborne laser scanning (Fig. 2a–d) revealed the general westerly topographical slope (Fig. 2a) and the range of table mountains in the eastern part of the study area (Fig. 2b–d). Geomorphological features such as old shorelines, mesas, incised streams, terminal moraines and glaciofluvial deposits were more or less pronounced in the different landform maps. The overall spatial pattern in the radiometric maps of 232Th and 40K (Fig. 2e–f) showed some similarities to each other, with the highest values found in the part dominated by clay soils (see Fig. 1).

Uranium showed a different geographical pattern (Fig. 2g). Very high 238U values were located in areas around the table mountains in the eastern part of the study area. This is related to occurrence of alum shale fragments in till and glaciofluvial deposits emanating from bedrock deposits in the table mountain area, reflecting the NE–SW movement of the inland ice. The high 238U values in the east-central part of the area coincided with the existence of kame and kame deltas along the western piedmont areas of the table mountains. Most of the organic soils, which have very low 40K since peat efficiently attenuates radiation from K, were located distally from this part (cf. Figs. 1, 2f). While not evident in Fig. 2g, there was substantial small-scale noise in the 238U map which negatively affected the performance of the calibration models in many cases. Therefore 238U were removed from the modelling.

Validation statistics for the different types of calibration models are shown in Fig. 3 (regional and farmwise statistics) and Fig. 4 (improvements with the Farm Interactive method). As is shown in Fig. 3, the Grid model performed particularly well for clay (r2 = 0.78; RMSEP = 6.5 %). Corresponding figures for sand was r2 = 0.72; RMSEP = 13.2 %). For SOM, however, the model did not perform as good (r2 < 0.2). However, the variation in SOM was very small and most values were within the range 2.5–4.5 % (Tables 1, 2). For the PTF variables, buffering capacity and target pH, in which SOM was included in the estimated reference values (Eq. 1 and 2), all models produced fairly high r2 and low RMSE values, indicating that SOM only influenced prediction of these variables to a limited extent. The r2 of the farmwise validations were less satisfactory but the RMSEs were often lower (Fig. 3). This means that the soil properties are estimated well on average at the farms, but that the variation at each farm is not captured equally well. Note though, that the variation between farms is considerable as indicated by the whiskers in Fig. 3. Nevertheless, the regional Grid model performed better on average for all modelled properties, except for SOM, compared with the currently used farm mapping approach (comparing Farmwise validation and Current in Fig. 3).

Fig. 3
figure 3

Overall validation statistics (r2 and RMSE [labels]) obtained for 1968 soil samples from 56 farms in the Skara area. Regional: The Grid model was applied on the Farms dataset; Farmwise: As the former but statistics was estimated for each farm and averaged, Current: Traditional soil mapping through inverse distance weighting. The whiskers depict variation between farms (the 25th and 75th percentile)

Fig. 4
figure 4

Improvements of farmwise validation statistics (r2 and RMSE obtained for 56 farms) when adopting the Farm Interactive approach compared to the Grid approach

The Farm Interactive approach resulted in locally improved mapping accuracy (lower RMSE) in 60–80 % of the farms if all soil properties are considered, the largest improvement was for SOM (Fig. 4). For this soil property also about two-thirds of the farms had higher r2 (Fig. 4). For the other soil properties r2 was higher in around 40 % of the cases (Fig. 4).

Maps of the entire Skara study area with 50 × 50 m2 spatial resolution produced with the Grid calibration deployed on the predictor variables are shown in Fig. 5. As mentioned earlier, there are three different soil texture subareas in the study area: (a) one dominated by clay soils in the west, (b) one with more sandy soils in the centre and (c) one where glacial till dominates in the east. At regional level, there was a clear resemblance between the spatial pattern of these maps and SGU’s general QD map (cf. Fig. 1). In subarea (a), the clay content is often in the range 25–40 % and slightly lower close to streams. In the central (b) area, clay content is lower than 10 % and the sand concentration is around 60–80 %. In (c) the SOM is generally higher, clay content is lower than 20 % and the sand content is in the range 40–60 %. As shown in Fig. 3, the SOM predictions from this calibration did not provide much detail, but merely gave a general outline of areas with different levels of organic matter content. The maps of buffering capacity and target pH largely followed the spatial pattern of the texture maps. Target pH was around 6.0–6.1 in the central and eastern subareas (b, c) and generally >6.3 in area (a) with its high clay content. The buffering capacity varied widely, with the liming requirement to increase the pH by one unit ranging from around 16 metric tonnes (t) ha−1 in the clayey subarea (a) to 6–8 t ha−1 in the sandy part (b) and approximately 10 t ha−1 in the subarea dominated by glacial till (c).

Fig. 5
figure 5

Output prediction maps based on the Grid modelling strategy. Organic soils are displayed in brown. The small square in the buffering capacity map (d) denotes the location of Entorp Farm, which was used as an example of local performance of the digital soil mapping methods described in Figs. 6 and 7 (Color figure online)

The way in which MARSplines models are parameterised means that some of the originally provided predictors may not be included in the final model. The predictors included in the final MARSplines models for the Grid dataset were:

Clay: 232Th, 40K, z, RT5, RT50, RT500

Sand: 232Th, 40K, z, RT500, SGU6Silt, SGU3Other

SOM: 232Th, 40K, z, RT50, SGU3Other

Buffering capacity: 232Th, 40K, z, RT5, RT50, SGU3Other

Target pH: 232Th, 40K, RT500, SGU3Other

Two predictors were used in models for all variables: thorium (232Th) and potassium (40K) from the airborne geophysical scanning. Elevation or at least one of the relative topography predictors was also included. The SGU3Other predictor was included in all models except for clay. This class contained all soil classes except those that were classified as clay or organic in the Quaternary deposit map.

Performance in practical precision agriculture

The variable-rate lime application map obtained for the 55-ha Entorp Farm was created from detailed measurements made using proximal soil sensors (as reported in Piikki et al. 2014) and a large number of soil analyses, with a spatial resolution of 20 × 20 m2 (Fig. 6). In this study, we regarded this map as ‘ground truth’. It was assumed that pH should be raised by 0.5 units throughout the farm. The lime requirement for this varied from less than 2 t ha−1 to more than 7 t ha−1. The highest requirement corresponded to a wedge-shaped area with clay and clay loam (30–45 % clay) in the centre of the two fields. This area is surrounded by mostly loamy sand with more than 70 % sand and less than 10 % clay. SOM varies very little at Entorp Farm (slightly less than 2 % on average).

Fig. 6
figure 6

Lime requirement (t ha−1) to raise the soil pH on Entorp Farm by 0.5 U, as determined from detailed maps derived by proximal soil sensing (Piikki et al. 2014). The area mapped covers 55 ha. The location of the farm within the study area is shown in Figs. 1, 5d (Color figure online)

Buffering capacity data from the 50 × 50 m2 point predictions of the two DSM methods (Grid and Farm Interactive) were extracted for the same area and interpolated to the same grid as in Fig. 6 using ordinary block kriging. For the Current approach, 20 soil samples that were part of the Farm dataset (with a spatial density of slightly more than 1 sample per 3 ha) were interpolated with inverse distance square weighting. Maps of the difference between these maps and the ‘ground truth’ map are shown in Fig. 7 and a compilation of the differences is presented in Fig. 8.

Fig. 7
figure 7

Differences between the maps produced with the three mapping methods: a Grid; b Farm Interactive; and c Current, and the detailed proximal sensing map in Fig. 6

Fig. 8
figure 8

Distribution of the differences between the variable-rate lime application maps for Entorp Farm produced with data from different mapping methods (see Fig. 7) and the detailed map in Fig. 6, which was produced with proximal sensing techniques

Compared with both the Farm Interactive and Current methods, the regional mapping method based on the Grid calibration produced maps with less of the area mapped correctly (within ±0.5 t ha−1 of the detailed proximal sensor map) (Figs. 7a–c, 8). The greatest errors of the Grid model occurred in the central area where the lime requirement was under-predicted. The Farm Interactive and the Current approach was almost equally successful, the first method correctly prescribed the lime requirement within ±1 t lime ha−1 in 77 % of the area, whereas the figure for the Current method was 75 % Taking all lime requirement deviations >1.5 t ha−1 as being entirely misclassified, both the regional method and the Farm Interactive method outperformed the Current method. The percentage of areas misclassified was: 9.9 % (Grid), 9.1 % (Farm Interactive) and 14.0 % (Current). Hence, the interpolation of local soil analyses resulted in many areas being correctly classified. However, since the sample density was as low as about 1 sample per 3 ha, a de facto standard in Sweden on those farms which sample for texture at all, a relatively large percentage of the farm became misclassified in the interpolation process. The largest errors occurred in transition zones between soil types.

Discussion

A novel approach to digital soil mapping was developed in this study. This comprised an interactive procedure (Farm Interactive) where additional, available soil analyses from a farm could be used for local augmentation of a regional model. By adding samples from a farm and giving these soil analyses extra weight, the variation in the regional model was better adapted to the local conditions on the farm, resulting in improved validation results compared with other DSM methods and with the current method of interpolation of data obtained on the farm itself. Particularly lower RMSE in most of the farms (for clay in 64 % of the farms, for sand 80 % and for SOM 84 %), but also higher r2 in many cases (30–60 % of the farms).

The average farmwise r2 validation values were lower than the regional r2 values (Fig. 3), mainly because the variation ranges on a number of farms were very small, which yielded low and unreliable determination coefficients. However, the RMSE values were slightly lower for the farmwise validations than for the overall data for all predicted properties. It is interesting to compare the performance of the Current approach, which is what farmers use at present, with that of the DSM methods (Fig. 3). As can be seen, on average the DSM methods performed better than interpolation of soil samples obtained from the farms. The only exception was SOM, for which it proved better to spatially interpolate a few real soil analyses than to estimate SOM from the predictors used, as also concluded in a recent work on modelling of proximal soil sensing data (Piikki et al. 2014). When the Farm Interactive method was used and the local samples were given extra weight, the validation figures often showed an improvement (Fig. 4). This means that local adaptation of regional data mining models can be helpful in deriving useful digital soil maps for precision agriculture.

If a regional digital soil map is available, it is tempting to apply it in precision agriculture at farm level. Many researchers state that this is one of the goals with digital soil mapping (e.g. McBratney et al. 2003). It would obviously be very beneficial and reduce the costs for soil sampling, while allowing management strategies to be better adapted to the prevailing conditions. A number of DSM products provide sufficiently high spatial resolution to give the impression that no further soil analyses are required. This approach is the aim even in global efforts, such as the Global Soil Mapping initiative (Arrouays et al. 2014; Hengl et al. 2014), where the goal is predictions made for six soil layers down to 2 m depth at 100 × 100 m2 spatial resolution. However, as noted by Arrouays et al. (2014), the resolution of the cell is not a measure of uncertainty, but rather a geometric framework for estimation. It is of course crucial to assess the uncertainty in the predictions in digital soil maps. Due to the often very large standard error, such data are not suited for use in precision agriculture, although unwary users may be misled by the high spatial resolution. As was shown in this study, however, a regional digital soil map can serve as the basis for improved local estimates. In the case study of Entorp Farm, the locally augmented regional DSM map (through the Farm Interactive approach) resulted in the most accurate liming maps of all the methods tested. The currently applied system of interpolation of relatively sparse on-farm observations may result in erroneous predictions in areas of a farm without soil samples. The severity of this problem depends on the magnitude of the local soil spatial variability. For Entorp Farm, it was shown that the area entirely misclassified was highest when the Current mapping methodology was applied. In contrast, the general regional soil mapping method – which actually did not have any calibration points at all within the Entorp area – with local adaptation through the Farm Interactive approach have potential to be applicable in precision agriculture. This approach was inspired by work by e.g. Wetterlind and Stenberg (2010), where national models calibrated with spectral libraries were boosted (or spiked) locally by adding local samples and giving them extra weight. The principles of spiking is further discussed in e.g. IUSS Pedometrics Commission (2012). In the present case, a 50-fold higher weight was given to the added local soil samples. Whether other weighting procedure can be even more efficient remains to be investigated. In initial tests for the present study, we tested 10, 25, 50, 75 and 90-fold weighting and in this case 50 gave the lowest prediction errors, so it was used in the main study.

In this study, a minimum number of nine soil observations was used in the Farm Interactive model. In practice, when deploying this model it is likely that the current sampling density for texture of 1 sample per 3 ha can be reduced even more, although this remains to be determined. Other implementations relying on the relationship between a primary, observed variable and a number of covariables, such as regression kriging or co-kriging, require a high number of observations of the primary variable and are therefore not possible to adopt on most farms.

The present study was a pre-study to a national extent DSM project in Sweden of arable soils called the Digital Arable Soil Map of Sweden (DSMS) financed by the SGU. The DSMS will be made publicly available and will provide farmers with more options than today. Following the results in this study, a farmer can then use DSMS as it is (Grid method), use own soil samples to validate and potentially improve the DSMS (Farm Interactive method), or to make a soil map based on the soil samples alone (Current method). Obviously the second choice is often but not always the best alternative (Figs. 3, 4). Our recommendation is to use soil samples to validate the three methods and choose the best method. Simple online tools are needed to facilitate this process, as well as tools to derive prescription files for practical precision agriculture (from the data in this study e.g. variable-rate liming or seeding).

The collection of gamma-ray data by aircraft has been underway for about 50 years in different countries, mostly for geological exploration purposes (Fig. 9). Data from such scans are very useful for mapping texture in the topsoil [for Swedish results see Tranter et al. (2011) and Söderström and Stadig (2012)]. High-resolution gamma-ray sensing from vehicles on the ground, e.g. in northern Europe, has shown that the thorium isotope (232Th) is highly correlated to soil clay content (Van der Klooster et al. 2011; Piikki et al. 2013). Detailed research carried out previously within the Skara study area indicates that 232Th derived from gamma-ray sensing is superior to many other soil sensor data in prediction modelling of soil clay concentration (Piikki et al. 2013). Here, 232Th was passively selected by the data mining method in all calibration models. Depending on the scanning strategy used, airborne gamma-ray spectrometry results in huge datasets with several complications concerning how the collected data should be filtered and recalculated to raster grids for further processing together with other co-variables. In this study, the 40K and 238U data contained a considerable amount of noise even after the normal filtering procedures, which impaired many of the calibration models. Therefore these variables were removed from the modelling except in the case of SOM, where 238U provided some useful input. Nevertheless, this indicates that further filtering and noise reduction are needed to improve the possibilities to fully use gamma-ray data for detailed digital soil mapping.

Fig. 9
figure 9

Airborne gamma-ray spectrometry sensing. Further development of methods for noise reduction is needed to improve the accuracy in the models

Most archive data available in Sweden were collected along transects approximately 200 m apart with one recording every 16–17 m along the flight lines, with a flight height of 60 m on average. The present recording density along the tracks has been reduced to one per 70 m (equalling one recording per second following an international de facto standard). Some earlier data were collected at a flight height of 30 m, but at that time the positioning co-ordinates were more uncertain. This mix of collection procedures introduces uncertainty in the data available to use in DSM. In addition, just as in satellite-based remote sensing, the registrations recorded are a mixture of the conditions prevailing inside the footprint of the sensor. In airborne gamma-ray scanning, the varying ground conditions in combination with the somewhat diffuse and large footprint (IAEA 2003) make single recordings somewhat fuzzy. However, by taking advantage of the large overlap between recordings, it may be possible to increase local precision in isotope estimates through various un-mixing or filtering procedures (Craig et al. 1999; Billings et al. 2003; Sanderson et al. 2008), leading to better possibilities to interpret the conditions on the ground. This is being investigated in an on-going project being conducted by SGU and SLU that aims at improving the quality and detail in the gamma-ray data available for digital soil mapping. Since gamma radiation was found to be crucial in the modelling approach in the present study, that work may bring future improvements in achievable map accuracy.

Conclusions

Regional data from airborne gamma-ray spectrometry, detailed airborne laser-scanning and traditional Quaternary geological soil maps, all available with national coverage in Sweden, were combined and successfully used for detailed (50 × 50 m2 spatial resolution) mapping of topsoil texture through MARSplines modelling.

For the particular application of direct use in precision agriculture, improved accuracy was achieved on many farms through a novel approach (here denominated ‘Farm Interactive’) where as few as nine local soil samples from a farm were added to the regional calibration model (lower RMSE on most farms). Local soil samples can be used for validation and adaptation of regional digital soil mapping for precision agriculture, rather than being the sole basis for soil information.

Soil organic matter could not be modelled with as good accuracy as soil texture by the chosen predictors. One reason why the MARSplines calibration models did not manage to reproduce the topsoil organic matter content could be the low variation observed in this parameter, apart from a few high values.

The modelling of two variables used for lime requirement calculations (target pH, buffering capacity) proved to be as good as, or better than, using soil samples obtained at farm level and mapped through interpolation, indicating the potential of regional digital maps for precision agriculture applications. It was found that the Farm Interactive methodology worked at least as well for construction of variable-rate lime application files for precision liming as currently used methods based on soil sampling and interpolation.

Farmwise validation is essential in identifying sufficiently accurate models to be used for precision agriculture purposes. Good validation measures at the regional scale can identify the models that explain the large-scale variation patterns within that region, but give little indication of how well variation patterns within an individual farm or field are modelled.