1 Introduction

Soil organic carbon (SOC) content is one of the key variables due to its influence on soil chemical, physical, and biological processes. The knowledge of SOC spatial and temporal variation could help to improve agricultural management concerning carbon sequestration in the context of climate change mitigation. The Paris COP 21 Climate Change Agreement [1] inspired the vision of the "4 per 1000" initiative. This initiative highlights the potential of soil organic carbon (SOC) storage in soils as a method to mitigate climate change. However, the practical application of this concept on a global scale poses a significant challenge due to the intricate considerations involved. Various agricultural practices, such as rewetting peatlands and avoiding land-use changes from pastures to croplands, are elements of this comprehensive issue. Therefore, the initiative underscores a complex and global aspiration to augment SOC stocks rather than suggesting definitive, universally applicable agricultural practices. The SOC field-scale variability has been considered in the delimitation of soil fertility management zones [2, 3], but it will require to monitor continuously its spatial variation in the context of evaluating agri-environmental measures [4] regarding soil carbon stock.

Spatially continuous SOC monitoring at the field scale and the larger landscape scale requires an integrated and strategic approach. While conventional chemical laboratory analysis offers accurate SOC estimates, it is not cost-effective for acquiring a sampling density high enough to produce a comprehensive field map. Therefore, the monitoring strategy needs to employ advanced remote sensing technologies, such as satellite imagery and drone-based sensors, for extensive data collection [5]. Ground-based samples, though less dense, remain necessary for calibration and verification purposes, providing a balance of detailed insight and broad coverage. Sophisticated data analysis methods and modeling tools are also indispensable for interpreting the data and predicting SOC contents with high precision. Regular monitoring over time is crucial for tracking changes, identifying trends, and assessing the impact of various soil management practices on SOC sequestration [6, 7]. The association between Visible and Near Infrared (Vis–NIR) spectra and SOC has been extensively researched [8, 9], which confirms its utility in developing predictive models. Proximal sensing, characterized by its direct or near-direct contact with the soil, provides an intermediate level of precision and resolution [10]. This method bridges the gap between intensive, expensive ground-based lab analyses and more extensive remote sensing techniques.

Remote sensing data, which can be gathered from satellites, airborne sources, or unmanned aerial vehicles (UAVs) [11], further extends the scale of monitoring. Remote sensing multispectral imagery can act as complementary data to SOC estimates, enhancing the resolution and reliability of soil mapping [9, 12]. Through the combined application of these techniques, we can achieve an effective compromise among precision, affordability, and extensive coverage in the monitoring of SOC.

Multispectral and RGB cameras onboard UAVs are cost-effective and have shown high functionality for agronomic applications [13], obtaining high spatial resolution data to cover the entire field [14]. These optical sensors have access to capture spectral reflectance and information concerning the image but several factors can affect the ideal conditions for data acquisition, including moisture, partial crop, and residue cover, as well as shading by soil clods, and soil surface roughness [15]. UAV campaigns at field scale are realized at low altitudes with a smaller field of view resulting in a high spatial resolution (i.e. cm-scale). However low-altitude flights may cause image distortion and the appearance of some artifacts [16]. Thus, it requires data correction to transform their original format of digital numbers to surface reflectance. There are different approaches to correcting and validating the UAV data. Most frequently researchers have used black-and-white targets [17], or proximal spectral measurements on the field [16].

Once the sensor data and conventional laboratory measurements are acquired, models are developed to link the spectral information to the SOC content. Sensor data, gathered through drone-based sensors, provide a broad, spatially extensive view of the scene. On the other hand, conventional laboratory measurements, though more time-consuming and expensive, offer a highly accurate estimation of SOC at specific sample locations. Both types of data play essential roles in building an accurate SOC model, with laboratory measurements providing ground truth data for calibrating and validating models built primarily on sensor data. The application of machine learning algorithms to remote sensing data is widely accepted due to their capacity to model intricate class signatures and accommodate diverse data types without presupposing the data distribution [18]. These methods require parameter tuning to govern the learning process [19]. One of the considerations of machine learning is its data-driven nature, which may overlook the actual understanding of the physical correlation between the spectral data and the response variable [20]. This situation could lead to potential misconceptions or inaccurate interpretations. Nevertheless, machine learning holds the promise of further investigating data to discover novel relationships [21]. This study aims to investigate the potential of UAV spectral information to monitor SOC at the field scale by (1) Developing machine learning models to establish robust relationships between UAV-derived spectral Vis–NIR signals and laboratory measurements of SOC content, (2) Applying these models for continuous field-scale SOC prediction, and (3) Rigorously testing various data processing and image correction procedures to mitigate the influence of soil surface effects. Our hypothesis suggests that through the integration of advanced spectral analysis techniques with UAV-derived data, more accurate and reliable monitoring of SOC content can be achieved, thus contributing to improved soil management practices and enhanced environmental sustainability.

2 Material and methods

2.1 Study area

The data collection took place at the Static Fertilization Experiment, located in Bad Lauchstädt, Saxony-Anhalt, Germany (Fig. 1a) (51°24ʹ N, 11°53ʹ E, 113 m above sea level). The climatic conditions of the area are marked by an average yearly rainfall ranging from 470 to 540 mm and a mean annual temperature of about 8.5–9.0 °C. The soil, according to the German soil classification system [22], is identified as Haplic Chernozem, which developed from loess [23]. The topsoil texture alternates between highly clayey silt (Ut4) and highly silty clay (Tu4), as per the German soil survey system [22].

Fig. 1
figure 1

Study area located in Bad Lauchstädt. a fertilization treatments and subfield divisions (SF). b UAV orthophoto, crops division of the year 2020, and sampling locations. Coordinate reference system: EPSG 25833

The Static Fertilization Experiment, launched in 1902 by Schneidewind and Gröbler, spans approximately 4 ha [24]. It comprises eight sections and started with a crop rotation of winter wheat, sugar beet, summer barley, and potato. From 2015 onwards, silage maize replaced sugar beet and potatoes in the rotation to minimize the labor required. Each crop was planted in a staggered fashion across different subfields to ensure their simultaneous growth at the experiment site. Every fourth spring, the first subfield is limed with 30 dt ha−1. Starting in 1926, legumes were incorporated into the crop rotation on the eighth subfield (not included in Fig. 1) every 7th and 8th year. The crop rotation for 2020, which coincided with the UAV flight campaign, is shown in Fig. 1b.

There are 288 distinct plots (270 without subfield 8), each differing based on its mineral and organic fertilizer treatments. One-third of each field received either 20 or 30 t ha−1 of farmyard manure, leaving the remaining two-thirds unfertilized. Mineral fertilizers, in varied combinations of N, P, and K, were applied, with certain periods comparing different types of N fertilizers. In 1978, the fourth and fifth subfields were modified to test different fertilizer treatments, involving varying quantities of N paired with adjusted organic fertilizer treatments. For additional details, refer to [25].

2.2 Data acquisition

Data collection includes UAV flights with a multispectral camera, Vis–NIR spectral contact measurements, soil sampling, and laboratory SOC analysis. In September 2018, soil samples were acquired at 100 locations, at 0–10 cm depth (Fig. 1b). To cover the spatial soil variability according to the LTE agricultural treatment without having to sample each of the plots, a stratified random sampling algorithm was applied to select 50 sampling points to collect soil samples and conduct spectral contact measurements. Another 50 soil sampling points were selected by using the Kennard-Stone algorithm (see details on [26]). The soil samples were air-dried, sieved (2 mm), and ground before carbon measurements with dry combustion. Total carbon was measured using the high-end elemental analyzer vario EL cube CN (Elementar Analysensysteme GmbH) with 3 replicates per sample. The measured SOC content has a mean value of 19.6 g kg−1 and a range between 14 and 25 g kg−1, showing a wide range of SOC values derived from the different fertilization treatments.

The UAV flight campaign was conducted in September 2020 after harvest and tillage with a field cultivator to minimize the remaining crop residuals at the soil surface. A MicaSense RedEdge 3 Multispectral Camera (MicaSense Inc.) mounted on a DJI Inspire 2 multicopter was used. The camera has 5 bands sensitive to Vis–NIR spectra (Table 1). Two flights with an altitude of 50 and 100 m above ground level (AGL) were done between 11 am and 2 pm under a clear sky with an average flight speed of 5.5 m s−1, obtaining images with a pixel resolution of 3.5 and 7 cm respectively. The use of flight planning software ensured that all images were recorded with sufficient overlap for photogrammetric processing, maintaining 85% forward and 65% sideward overlap. Before and after each flight, the "MicaSense radiometric panel" was captured in the field to allow for subsequent reflectance calibration of each image. To optimize the geographic position of the data set, eight Ground Control Points (GCPs) were utilized, which were previously measured in the field using a Differential Global Positioning System (DGPS).

Table 1 Spectral bands of the MicaSense camera

Spectral contact measurements to correct the UAV data were taken using an ASD FieldSpec 4 Hi-Res instrument by Malvern Panalytical (hereinafter will be called ASD). The ASD measures the Vis–NIR range (350–2500 nm), with a Full-Width Half Maximum (FWHM) of 3 nm in the Vis and 10 nm in the NIR, and an output of 1 nm spectral resolution. Field measurements were done after crop harvest in sunny and dry soil conditions in September 2018. The spectra were measured at the soil surface at each sampling point using a 50 × 50 cm frame pointing north. A total of 15 spectra were acquired at each sampling point excluding crop residuals. 5 locations within the frame were measured with 3 external and 25 internal scans.

2.3 Model algorithms

The linear regression model (LM), multivariate adaptive regression splines (MARS), and the cubist regression model (CUBIST) were selected for their suitability and past performance in remote sensing and SOC modeling [27]. The R-packages ‘earth’ [28] and ‘Cubist’ [29] were used to implement these models. They were selected due to their comprehensive functionality, robust performance, and wide usage in the scientific community for similar types of analysis.

2.3.1 Linear model

The LM was selected for its simplicity, interpretability, and efficiency. LM serves as a fundamental statistical technique. Its role in our study is twofold: to provide a readily understandable baseline model and to offer a comparison metric for more complex machine learning models. We utilized LM to establish a relationship between SOC measurements at the sampling sites and the spectral UAV image information.

2.3.2 MARS

MARS [30] is a non-parametric regression technique, adept at modeling complex non-linear relationships. Known for its accuracy and adaptability, MARS skillfully navigates multiple predictors to unveil non-linear interactions.

MARS constructs a relationship between the dependent (response) and independent (predictor) variables using a unique array of coefficients and basis functions. This process, controlled by the regression, undergoes two stages: Initially, piecewise linear or cubic splines are crafted as basis functions, deliberately overfitting the data. Subsequently, these basis functions undergo pruning—a process of reduction based on the optimal fit to the data. Pruning, in this context, refers to the elimination of superfluous terms that contribute to overfitting, thereby improving the model’s generalizability to unseen data [31].

In this research, we tuned specific MARS parameters, including the maximum number of terms eligible for pruning (experimenting across a spectrum of 2–100) and the degrees of freedom (tested within a range of 1 to 3). The term degrees here refers to the complexity of the basis functions, with a degree of 1 representing a piecewise linear function and higher degrees indicating cubic splines that can capture more complex, non-linear relationships.

2.3.3 CUBIST

CUBIST is a rule-based model that combines decision trees with linear regression models [32]. A tree is grown with linear regression models in its terminal leaves. These models are constructed using the predictors from earlier splits. At every level of the tree, there are additional intermediate linear models. At each terminal node of the tree, a prediction is made using the linear regression model, but it is smoothed by accounting for the prediction from the linear model in the preceding node of the tree. This unique amalgamation results in an excellent balance between interpretability and predictive power. Capable of effectively managing large datasets and multiple variables, CUBIST presents a suitable option when dealing with the intricate nature of remote sensing data.

Distinguished from other decision tree algorithms, CUBIST utilizes specialized procedures for model smoothing, rule generation, and pruning. A unique characteristic of CUBIST is the optional committees feature, a form of boosting procedure designed to enhance the model’s predictive accuracy. A series of rule-based models can be generated to establish model committees. Based on the previous model fit, the training set outcome is modified, and a new set of rules is then constructed using this pseudo-response. Additionally, the model’s prediction can be fine-tuned based on the characteristics of neighboring data points. In the pruning process, CUBIST employs a weighted linear combination of two decision trees, with the weights computed based on the residuals of each tree, thereby optimizing the model’s simplicity without sacrificing accuracy [33]

In our study, we tuned specific parameters in the CUBIST model, including the number of neighbors (evaluated from 0 to 10) and the size of committees (tested within a range of 1 to 100). Neighbors refer to the number of nearest data points considered in adjusting the prediction, while committees denote the number of decision trees used in the boosting procedure to enhance the model’s performance.

2.4 Workflow

The workflow including image processing, model building, and spatial SOC prediction follows the flowchart presented in Fig. 2. The following subsections explain the details.

Fig. 2
figure 2

Flow chart of processing steps for the SOC estimation. UAV: unmanned aerial vehicle, ASD: ASD FieldSpec 4 Hi-Res instrument; BVIS: brightness in the visible range; SOC: soil organic carbon; LTE: long-term experiment, CUBIST: cubist regression model, and MARS: multivariate adaptive regression splines

2.4.1 Image processing and correction

The photogrammetric workflow was carried out using Agisoft Metashape (Agisoft LLC). The initial step of generating multispectral orthophotos involved calibrating the reflectance of each captured image, using respective reflectance calibration images taken both before and after each flight. Subsequently, all images, acquired at altitudes of 50 m AGL and 100 m AGL, were aligned together to achieve the best possible co-registration between the two datasets. To further enhance the alignment, matching features based on fewer than three images were removed and camera alignment was subsequently optimized using Agisoft Metashape’s predefined function. Following this, ground control points were used for manual alignment optimization.

To generate a most detailed Digital Surface Model (DSM) from the available data, images acquired at 50 m AGL were utilized to calculate a dense point cloud. This point cloud was then used to process the DSM and orthophoto from the 50 m AGL dataset. In a separate process, utilizing the previously generated higher-resolution DSM, an orthophoto was derived from the aligned images taken at 100 m AGL. In the context of image correction, several steps are involved. First, a correction process begins with the application of an NDVI mask to account for germination. This mask helps address areas with low vegetation cover. Additionally, a mask based on the brightness in the visible range (BVIS) is applied to handle crop residuals effectively. Determining suitable thresholds for both NDVI and brightness involved analyzing dispersion plots showing the relationship between NDVI and brightness. These plots visually depicted the distribution of points, highlighting areas with typical vegetation and outliers. By examining these plots, we identified clusters representing typical vegetation and outliers like bare soil or dense vegetation cover. The thresholds were selected to distinguish between typical vegetation and outliers, enabling accurate masking of areas with low vegetation cover using NDVI and addressing crop residuals using brightness values.

In the spectral correction process, the ASD field measurements obtained at each sampling location served as reference values for correcting the spectral response obtained from the UAV imagery. Two distinct approaches were employed for this correction: non-stratified and stratified correction. In the non-stratified correction approach, uniform threshold values were applied for both NDVI and brightness masking across the entire study area. Following the masking step, the spectral correction involved calculating the ratio between the ASD field measurements and the corresponding values obtained from the MicaSense spectral response function. This ratio served as a correction factor to adjust the spectral response obtained from the UAV imagery, aligning it with the reference values obtained from the ASD measurements. The stratified correction approach involved applying different threshold values for each specific crop type present in the study area. This allowed for a more specific correction process, where threshold values were determined based on the specific spectral characteristics of each crop. The spectral correction was then applied based on the point locations corresponding to each crop type, ensuring a more accurate alignment between the UAV-derived spectral response and the reference values obtained from the ASD measurements. Table 2 provides detailed information on the specific quantiles used as thresholds for both the non-stratified and stratified corrections.

Table 2 Selected threshold values for masking crop residues and vegetation

2.4.2 Model building

Model building requires the formation of a robust predictor-response dataset, an essential component that lays the groundwork for establishing and tuning the desired models. In our study, this dataset was generated by using the mean pixel value at each sampling point location, and this process was carried out using different search radii: 0.25, 0.5, 1, and 2 m. For the MARS and CUBIST models, grid search methods were employed to determine optimal tuning parameter values. In the case of the LM, we sought to identify the model configuration that yielded the lowest Root Mean Squared Error (RMSE). The process of model training, tuning, and evaluation was executed through a stratified fivefold nested cross-validation, as detailed in [26]. To address potential issues of spatial autocorrelation between test and training sets, neighboring samples within a 5 m distance were grouped into the same fold. Stratification involved two aspects: (1) The data were stratified with regard to the response variable, and (2) With regard to the recently harvested crop. The latter was included due to different soil surface characteristics in dependence on maize versus cereal crops. This approach not only maintains the integrity of our model evaluation by minimizing the spatial correlation between training and testing datasets but also ensures a balanced representation of the target variable across all test and training sets. Model evaluation was done with 5 repetitions. Thus, 25 models were obtained for each dataset. Equal data subdivisions were used to compare models trained on different data (non-stratified and stratified image correction) and by different algorithms (LM, MARS, CUBIST). RMSE and R-squared were used as error metrics of model performance, and the Concordance Correlation Coefficient [34] is presented in the average predicted versus observed values.

2.4.3 Spatial prediction

The models created using LM, MARS, and CUBIST were deployed to predict SOC from the multispectral UAV images (both non-stratified and stratified). These predictions were carried out on images featuring mean pixel values, with the average computed within the same radii used during model development (0.25, 0.5, 1, 2 m). This strategy was pursued to mitigate the impact of potential image artifacts, shadows, and small-scale variations in soil surface conditions. It is important to note that the predictions were based on averaged data, specifically to counter these influencing factors. The gap-filling of pixels removed by masking was done through spatial interpolation, which is commonly applied in remote sensing studies [35]. Specifically, two methods were tested: inverse distance weighting (IDW) for all datasets and ordinary kriging (OK) on the models with the best performance. The corresponding experimental semivariograms for OK were applied, testing conventional variogram models: Spherical, Exponential, and Gaussian semivariogram models [36]. The geospatial analysis was done using the R-package ‘gstat’ [37], and the plots were done using the R-packages ‘ggplot2’ [38] and ‘lattice’ [39].

3 Results and discussion

3.1 Performance metrics

Performance metrics of the LM, MARS, and CUBIST models trained with the various predictor-response datasets are presented in Fig. 3. Compared to the non-stratified image correction, the stratified method showed a slight improvement in the performance of the best model. Similar trends were observed in RMSE and R2. An increase in accuracy was noted with a larger search radius of 2 m for the predictor-response dataset. Conversely, accuracy decreased, and dispersion increased when the predictor-response data were obtained with a smaller search radius. When comparing models, CUBIST is the one that presents the best performance (non-stratified image correction R2 = 0.53, RMSE = 2.1 g kg−1; stratified image correction: R2 = 0.54, RMSE = 1.9 g kg−1), followed by LM and MARS. Positive results of CUBIST in remote sensing applications have also been identified by other studies [40,41,42]; meanwhile, MARS did not perform better than a simple LM model. The good performance of CUBIST could be due to the predictions usually outside the domain of the input response and the ability to establish linear and non-linear relationships [43].

Fig. 3
figure 3

Predictive model performance of the different models. a RMSE; b R-squared. NST: non-stratified correction; ST: stratified correction

Regarding the predictive performance of the three model algorithms, it becomes apparent that the use of corrected images yields better performance in comparison to non-corrected images (R2 = 0.35, RMSE = 2.3 g kg−1). This finding aligns with several other studies in this emerging field of SOC predictions using UAV images at the field scale. For instance, [44] obtained an RMSE of 2.1 g kg−1 with loamy soil using Support Vector Machines, while [11] and [13] reported RMSE values of 2.7 and 2.9 g kg−1, respectively, for their best models in agricultural fields.

In our study, the RMSE values obtained were higher compared to the performance of ASD laboratory measurements (R2 = 0.9, RMSE = 0.9 g kg−1) and field measurements (R2 = 0.77, RMSE = 1.4 g kg−1) observed by [45], and Veris on-the-go field measurements (R2 = 0.84, RMSE = 1.24 g kg−1) observed by [46] at the same study area. A comparison of the average predicted values with the measured values, using a predictor response dataset with a 2 m radius, is displayed in Fig. 4. The concordance correlation coefficient supports the strong performance of the CUBIST models, even though values in the lower range tend to be overestimated. Upon examining the residuals (Fig. 5), no strong differences in variances are evident when comparing methods using a predictor response dataset with a 2 m radius. However, when comparing the largest positive and negative values of LM with MARS and CUBIST, the largest negative residual values appear consistent, while a noticeable shift is observed in the largest positive residual values.

Fig. 4
figure 4

Predicted versus measured observations using a response dataset with a 2 m radius. Non-stratified image correction: a LM, b MARS, c CUBIST. Stratified image correction: d LM, e MARS, f CUBIST

Fig. 5
figure 5

Measured versus residual values using a response dataset with a 2 m radius. The 10 largest positive and negative residual values of the linear model (non-stratified and stratified) are marked for comparison with the corresponding MARS and CUBIST models. Non-stratified image correction: a linear model, b MARS, c CUBIST. Stratified image correction: d Linear model, e MARS, f CUBIST

3.2 SOC spatial prediction

The impact of organic fertilization (Fig. 1a) is evident in the spatial distribution of SOC. The eastern part of the field, which did not receive farmyard manure, displays lower SOC values. Conversely, the differences in SOC between the sections of the field treated with 20 or 30 t ha−1 of farmyard manure are less significant. Concerning mineral fertilization, its influence on SOC is less apparent. However, plots that did not receive fertilization tend to present lower SOC values compared to those where NPK was applied.

Figure 6 illustrates the spatial predictions from each model using a mean pixel value of 2 m. In the image corrected by stratification, the plot treatments division is more visibly defined in the LM and MARS models (it is slightly less apparent with CUBIST). In contrast, the non-stratified corrected image shows a more homogeneous or smoothed spatial distribution of SOC. This smoothed pattern is especially discernible in the LM and CUBIST models, attributable to the reduction of pixel-level variability through mean pixel values. On the other hand, the MARS model tends to lean more towards higher SOC values, suggesting it may capture more of the smaller-scale variability in SOC. The spatial distribution pattern of the SOC presents similarities with the observations made by [46], particularly for the LM and CUBIST models. They used a Veris on-the-go spectrometer in the same study area and combined PLSR with ordinary kriging models to predict at a 1 m resolution.

Fig. 6
figure 6

Spatial prediction of the different models using a mean pixel value of 2 m. Pixel gaps are filled using inverse distance weighting. Non-stratified image correction: a LM, b MARS, c CUBIST. Stratified image correction: d LM, e MARS, f CUBIST

Although the literature on SOC estimation is growing, publications that present SOC maps at the field scale using UAV data are even fewer. Studies by [13, 40, 44] at the landscape scale are some of the few examples. This gap in research underscores the need for additional studies projecting UAV-derived model outcomes to obtain spatial distribution maps of soil properties, namely SOC contents.

In this study, we also examined the effect of image pixel averaging based on different radii, with an example using the CUBIST model presented in Fig. 7. With a 2 m radius, the SOC spatial distribution appears more consistent between the two corrected images. This consistency diminishes as the radius used for mean pixel value calculation decreases, increasing the dispersion of SOC values, and reducing the clarity of the spatial pattern.

Fig. 7
figure 7

SOC spatial prediction using the CUBIST model with different mean pixel values. Pixel gaps are filled using inverse distance weighting. Non-stratified image correction: a 2 m, b 1 m, c 0.5 m, 0.25). stratified image correction: e 2.00 m, f 1.00 m, g 0.50 m, h 0.25 m

This result aligns with our expectations, as models using a smaller radius demonstrated higher uncertainty. It also reflects the impact of individual pixels or localized areas with weak SOC relationships, which were not eliminated during image masking. [47] employed a smoothing technique over nine adjacent pixels with Airborne Hyperspectral Imagery (2.5 m pixel resolution) to map soil properties in agricultural fields. Their results showed a more accurate average representation of soil properties, which they attributed to noise reduction and signal improvement. This effect becomes more pronounced at higher pixel resolutions.

3.3 Potential use of UAV spectral measurements for SOC monitoring

Our findings emphasize the potential to leverage UAV-derived data for predicting field-scale SOC in a long-term experiment setting, illustrating its utility for the continuous monitoring of SOC variations. The predictive performance of the models is competitive with those reported in similar studies that utilize multispectral UAV data. However, it is important to recognize a noticeable performance gap compared to laboratory spectroscopy performed under controlled conditions. Laboratory spectroscopy generally offers higher accuracy due to controlled conditions, while field spectroscopy, influenced by environmental factors, yields accuracy in SOC prediction that sits between laboratory and UAV measurements. Nevertheless, the scalability of lab measurements is limited. In contrast, field spectroscopy more closely aligns with the UAV’s perspective and data collection conditions, making it more suitable for field application. The quality of data from UAVs can be influenced by various factors such as the specifics of the camera used, the flight parameters, and prevailing environmental conditions [11]. In our experiment, a flight altitude of 100 m resulted in superior data quality compared to a 50 m flight. This could be due to the increased capture of noise at a higher pixel resolution [48]. The practice of using mean pixel values helped to smooth out the spatial SOC patterns, minimizing the effect of individual pixel variability, and thus emphasizing the overall trend values within each plot. When using a smaller search radius, both the size of the average pixel values and the use of the worst models caused an unclear representation of the SOC variability. An alternative for improving the results on these smaller mean pixel values could be the application of a model based on a response dataset with a larger search radius (e.g. prediction pixels on the 0.25 m mean pixel values with the 2 m predictor response dataset based models). A direct prediction of a single pixel is more complicated due to the artifacts, shadows, or residues, thus different approaches could be done to create the field maps. For example, [13] used the predicted values using UAV data at the sampling point locations and then created the SOC map through interpolation, smoothing the spatial pattern although with lower spatial resolution; [44] used PCA dimensions of the variables to generate the model and then used these values for the spatial prediction. The maps produced with a 1 and 2 m mean pixel value showed a good representation of the field spatial variability and should be sufficient to monitor the SOC variation, thus the potential utilization of meter-scale satellite spectral data should be considered.

The SOC spatial prediction at the field scale can be improved by spectral image correction through laboratory and proximal sensing measurements. The correction through field ASD measurements, which has been used for model calibration and correction in other studies [16, 49] improved considerably the model performance compared to using the original orthophoto (decreasing about 18% of the RMSE with the best models). The effect of the plot treatments was observed through the images and was more evident when using the stratified corrected image, which hardly could be observed in the case of using point measurement combined with geostatistical spatial interpolation. Also due to the high spatial resolution, it is possible to identify the spatial variation inside of the plot treatments, which is commonly not considered when using conventional measurements where an average value is considered for each plot. Regarding the gap filling of pixels removed through spatial interpolation, IDW and ordinary kriging were applied but no difference was observed when comparing the same corrected image, possibly due to the high density and short lag distance of close neighbors, where a simple weighted linear relationship like IDW seems to be sufficient. Nevertheless, it needs to be noted that the study was done in an LTE instead of a conventional agricultural field, where the SOC variability is lower, thus a lower performance could be expected.

While the use of UAV-derived data offers substantial potential, it is important to note that the number of variables available for model building is limited. To overcome this limitation, we tested various indices, including the Brightness index, Modified Soil Adjusted Vegetation Index (MSAVI2), Redness index (RI), Color index (CI), Transformed Vegetation Index (TVI), Green–Red-Vegetation-Index (GRVI), Vegetation Index, Green Normalized Vegetation Index (GNDVI), Normalized difference vegetation index (NDVI), Green Soil Adjusted Vegetation Index (GSAVI), Green Optimized Soil Adjusted Vegetation Index (GOSAVI), and Soil Adjusted Vegetation Index (SAVI) [50]. These indices, which primarily pertain to vegetation and consist of a combination of visible and infrared bands, were compared to other soil-related indices that incorporate short-wave infrared bands (SWIR; 900–1700 nm). However, these SWIR bands are outside the range of those available with the MicaSense camera used in our study. The importance of visible and infrared spectral bands associated with SOC is well documented [8], and leveraging these could potentially facilitate the development of a more robust model. Despite our efforts, the incorporation of these indices into our models did not substantially improve our SOC estimations (with an RMSE of about 2.0 g kg−1 and R2 of 0.55 using CUBIST models for both non-stratified and stratified corrections).

Nonetheless, the ever-evolving nature of UAV technology offers considerable optimism for the future. With advancements in image quality and the inclusion of additional spectral information, particularly in the SWIR, we anticipate that this technology will provide increasingly accurate SOC estimations [51]. This suggests that the deployment of more sophisticated hyperspectral sensors and the incorporation of further spectral information may be key to enhancing our ability to estimate SOC from UAV data.

Our study highlights the fundamental premise of leveraging UAV-derived data to field-scale SOC monitoring. It demonstrates adequate predictive performance, underlining the potential of this approach. Factors such as flight altitude and mean pixel values significantly influence data quality and spatial SOC patterns, emphasizing the need for meticulous data preprocessing. Notably, spectral image correction using field ASD measurements enhances model performance, refining SOC spatial prediction accuracy at the field scale. While various spectral indices yielded limited improvements in SOC estimations, ongoing advancements in UAV technology, particularly with hyperspectral sensors, offer promise for future accuracy enhancements. However, addressing challenges related to image correction and model selection requires ongoing refinement efforts. Moving forward, collaborative endeavors will be essential in crafting robust SOC monitoring frameworks that support sustainable agriculture and climate change mitigation.

4 Conclusions

This study highlights the effectiveness of spectral UAV-derived data in predicting field-scale SOC within a long-term experiment. The accuracy of these predictions improved significantly with spectral correction via ASD data and the masking of germination aspects and crop residuals. Crop stratification offered a distinct delineation of plot divisions, enriching our understanding of SOC field-scale variability.Of all the algorithms tested, the CUBIST model demonstrated the best performance. This was particularly true when a 2 m search radius was used to select average pixels for the predictor response dataset. However, the model’s accuracy decreased as the search radius shrank. Future research should focus on broader agricultural landscapes and conventional agricultural fields with less SOC variability than observed in long-term experiments. The spatial prediction of SOC clearly illustrated the impact of fertilization on plot treatments, but the resulting patterns varied based on the model and the type of image correction applied. When data with different mean pixel values was used, the image provided a meter-scale map that effectively represented spatial variability.

These findings contribute to the growing body of knowledge on UAV data applications in soil science and guide future advancements in SOC monitoring. They highlight the potential to utilize the spectral information captured by remote sensing technologies, which has shown a relationship with SOC content. This suggests that field-scale SOC changes could potentially be monitored through satellite remote sensing data.