1 Introduction

China’s coal production was 3.90 × 109 tonnes in 2020, accounting for 50.4% of total world coal production (IEA 2021). Coal mining process have caused serious damage to the land resources, crop production, and the ecological environment (Shrestha and Lal 2011), and induced a very severe disturbance of the soil organic carbon (SOC) pool.

Considering that the SOC pool is the largest potential factor in reducing the carbon emissions of terrestrial ecosystems (Miller et al. 2015; Zhang and Ni 2017), China and other major coal-producing countries in the world must quantitatively study the disturbing influence of coal mining on the SOC pool, so as to improve scientific management of SOC pool in coal mining areas and realize regional land low-carbon utilization.

At present, due to the frequent human mining activities, which will result in the changes of some ecological environmental factors in coal mining subsidence areas such as land subsidence (Liu et al. 2021), surface destruction (Redondo-Vega et al. 2017), soil erosion (Wang et al. 2020; Su 2021), vegetation destruction (Li et al. 2016), surface runoff and groundwater hydrology (Hu 2021; Song et al. 2021), etc. No matter which factor changes, it will have an impact on the soil carbon pool, so that the SOC content in the mining area usually has strong spatial variability (Cheng et al. 2014; Jun et al. 2015).

In recent years, a plenitude amount of work has been conducted in the impact of coal mining on the SOC pool. For instance, Fu (2017) analyzed the distribution of SOC and the liable organic carbon fraction in the typical subsidence wetland. Furthermore, the main impact factors of SOC formation and distribution have also been studied. In addition, subsidence wetland with different utilization types has been chosen to study the human impact on SOC. Huang (2014) found that the carbon sink amount of the vegetation-soil system, affected by coal mining, reduced in the Xinzhouyao coal mine, Datong Mining Area, Shanxi. In order to understand carbon dynamics in mine soil, the spatial variation of SOC contents was investigated in two types of landscapes destroyed by coal mining, i.e., subsidence slope and ground fissure site from Jiaozuo mine area, China (Cheng et al. 2014). Many studies of SOC pool in farmland ecosystems have also been conducted. Tian (2020) took the Changhe Basin mining area as an example and established a method for estimating the carbon sequestration loss of farmland eco-system caused by coal mining, concluding that the influence of coal mining on carbon sequestration in the farmland ecosystem belongs to a carbon loss effect. The carbon loss effect of coal mining on SOC pool in farmland has also been demonstrated in another study (Xu et al. 2019). These studies have shown that when the soil in mining areas is damaged, the carbon stored in the soil also decreases massively. Therefore, understanding the spatial distribution of SOC in mining areas is of great significance for controlling greenhouse gas emissions and land management in mining areas.

To efficiently and accurately understand the spatial distribution of SOC, various geostatistical methods have been applied to predict SOC. Kriging is one of the most widely used methods among the stochastic techniques and is the best linear unbiased estimator in the sense that it minimizes the variance of the estimation error (Dai et al. 2014; Ren et al. 2021). Therefore, it shows considerable advantages in SOC prediction. However, this method does not consider the relationship between soil properties and environmental factors. Based on the shortcomings of this method, prediction models for SOC, taking into account environmental factors, began to develop. It mainly includes multiple linear regression (MLR) (Zhang et al. 2017), regression kriging(RK) (Zhang et al. 2012), and geographically weighted regression model (GWR) (Wang and Wu 2020). Kriging and regression analysis are both based on the linear relationship between the target and environmental factors, but the relationship between soil and environmental factors is usually a complicated nonlinear relationship. To overcome these problems, machine learning algorithms, driven by big data, have been increasingly applied to spatial prediction of soil organic carbon such as random forest (RF) (Yuan et al. 2021), support vector machine (SVM) (John et al. 2020), artificial neural network (ANN) (Pudełko et al. 2020) and Boosted regression tree (BRT) (Akpa et al. 2016).

Because ANN can automatically learn and analyze the nonlinear relationship between multi-source inputs, researchers have successively applied it to the spatial prediction of SOC, and achieved fairly good prediction performance (Lai et al. 2020). For instance, Morais et al. (2021) combined laboratory NIR spectral data with ANN to estimate the SOC content of pasture soils in Portugal. Were (2015) compared the performance of SVR, ANN and RF in predicting and mapping SOC stocks in the Eastern Mau Forest Reserve, Kenya. As a traditional ANN, the radial basis neural network (RBFNN) can approximate arbitrary functions with arbitrary accuracy due to its strong nonlinear fitting ability, and is widely used in digital soil mapping. Using RBFNN and high-precision surface model, Luo (2016) achieved high-precision simulation and prediction of the spatial variation of SOC in Purple Soil Hilly area of Mid-sichuan Basin. Lai et al. (2020) used the RBFANN and its model combined with OK (RBFNN-OK) to predict the spatial distribution of SOC content, comparing its performance with MLR, RF, OK.

In summary, many models and methods have been established for the prediction of regional SOC and its spatial distribution, but the model, suitable for prediction of SOC and its spatial distribution in coal mining subsidence areas with intricate terrains, is relatively few and short of relevant case studies. In this paper, the Changhe River Basin was chosen as a study area and the RBFNN was used to predict the spatial distribution of SOC. The prediction precision of this model was compared with the conventional Kriging model to explore a spatial prediction model suitable for soil organic carbon in coal mining subsidence areas.

2 Materials and methods

2.1 Study area

The study area is the Changhe River Basin (35°30ʹ10ʺ N to 35°38ʹ06ʺ N and, 112°40ʹ37ʺ E to 112°46ʹ04ʺ E), which is located in northwest Jincheng City, Shanxi Province, China, with a total coverage of approximately 113.224 km2. There are 48 administrative villages in the region including Chuandi Township, and Dadonggou and Xiacun towns. The location is shown in Fig. 1. The study area has a warm-temperate semi-humid continental monsoon climate. The mean annual air temperature, precipitation and sunshine hours are 10.9 ℃, 628.3 mm and 2392.8 h, respectively. The area is located on the southeastern edge of the Loess Plateau and the west terrain is higher than the east, with an elevation between 723 and 1174 m. The topographic relief fluctuates greatly, showing a geographical pattern of two mountains and a river. The east and west are mountains and hills with complex terrain and the Changhe River flows from north to south in the middle, forming river valleys in the central region. The main soil type in the area is cinnamon soil, and there is a small amount of meadow soil. Alkaline soil is the main soil type in the hilly area, which is mainly cultivated. Corn, potatoes and wheat are the main crops in the agricultural cultivation area, with types of wheat and maize planted according to rotation cropping, producing three crops over two years.

In this area, coal mines are relatively concentrated, with large coal production, abundant coal resources and good coal quality. Currently, coal seams No. 3, 9 and 15 are mainly used. There are several coal mines across the region. The area is therefore affected by high-intensity coal mining and large areas of land have collapsed to different degrees. According to observed data from the mines, after a few decades of mining subsidence, the maximum subsidence is 6500 mm, the maximum incline deformation is 25.7 mm/m, the maximum horizontal movement is 2840 mm. The maximum horizontal deformation is 38.2 mm/m. Therefore, this is a typical study area for coal mining subsidence.

2.2 Soil sampling and analysis

Field sampling was conducted in the Changhe River Basin in July 2015. Based on the location of the study area, the sampling points should be distributed as uniform as possible, and therefore the grid sampling method was used in this study. First, the study area was divided into 1 km × 1 km grids. Taking the center of the grid as the circle center and 5 m as the radius, 5 points were set along two diagonal lines in each soil layer. On each grid, five subsamples of 0–20 cm and 20–40 cm were collected and merged into one composite sample (about 1 kg), respectively. Finally, the soil samples were brought back to the laboratory and their coordinates were recorded using a handheld GPS (Sun et al. 2018). A total of 106 soil samples were collected from each soil layer, and 20 samples were randomly selected as validation samples to validate the accuracy of the SOC prediction model. The remaining 86 samples were used for model prediction. Using the “create subset” function of Geostatistical Analyst in ArcGIS 10.0 to classify these samples. The distribution of the sampling points is shown in Fig. 1.

Fig. 1
figure 1

Location of the study area and distribution of soil sampling sites

The soil samples collected outdoors were taken back to the laboratory, air-dried and crushed to pass through a sieve with a 2 mm mesh to remove the animal and plant residues. The soil organic carbon content of the sample was determined using the potassium dichromate (K2Cr2O7) oxidation-titration method. During this process, the SOC is oxidized by potassium dichromate and heated to 170–180 °C for approximately 5 min. The excess organic potassium dichromate was then titrated by standard 0.2 mol/L ferrous sulfate (FeSO4) to determine the SOC content (Guo et al. 2019).

2.3 Analytical thinking

2.3.1 Auxiliary variables

Previous studies have shown that factors such as topographic properties (Hao et al. 2002) and vegetation (Lemma et al. 2006; Soleimani et al. 2017), climate change (Coxson and Parkinson 1987) and land use (Brejda et al. 2001) and other factors have a great impact on the spatial distribution of soil properties. For a particular mining subsidence land, climate change is not the dominant factor affecting soil organic carbon change because of the small region, while vegetation and land use are important factors affecting soil organic carbon change.

In the coal mining areas, the goaf areas are formed after the coal mining panels excavated in the study area, which destroyed the original stress equilibrium of the subterranean strata. And the stress transmitted through the stratum induces an inconsistent deformation of the overburden rock above the goaf areas. The stress makes the surface deformation in the horizontal and vertical directions, which changes the topography of the mining subsidence area. The surface deformation caused by coal mining is the root cause of SOC changes in the mining subsidence areas. The conventional indexes describing surface deformation are subsidence, inclination, curvature, horizontal movement, horizontal deformation, distortion and shear deformation. Firstly, the physical changes such as surface subsidence and cracks in the mining area have direct damage to the vegetation (Xu 2012). Secondly, the surface deformation directly affects the erosion intensity of precipitation on surface soil, thereby affecting the loss and accumulation of SOC (Ren et al. 2018).

Coal mining may lead to changes in groundwater systems and surface runoff in subsidence areas, which will change the soil water content, and ultimately change the carbon storage and spatial SOC distribution in mining subsidence areas. The evaporation and infiltration of surface water, water erosion, wind erosion are changed by the cracks and collapses formed by coal mining, which further leads to change of the soil water content (Qie et al. 2015; Wu et al. 2019; Mo et al. 2015). Change in soil water content can further affect crop carbon input (Wang et al. 2017a, b) and characteristics of microorganisms (Chang et al. 2021). In addition, studies have also shown that soil water content will affect soil enzyme activity, and ultimately affect the conversion and circulation of soil nutrients such as carbon, nitrogen and phosphorus (Han et al. 2019).

Land use change, surface subsidence, terrain slope and vegetation coverage induced by coal mining are also important factors affecting SOC pool in the subsidence area. Mining subsidence induces topographic slope in the subsidence area. Affected by the rainfall, wind and other external factors, soil erosion loss in the areas with lower vegetation fraction occurs, which will change the carbon storage and spatial SOC distribution in mining subsidence areas. Different terrain factors will control the surface water, heat redistribution and vegetation zonality, thereby affecting the accumulation of SOC (Chang et al. 2021; Li et al. 2013; Huang et al. 2018; Meng et al. 2017; Zou et al. 2019). Under different land use patterns and vegetation types, the roots, the quantity and quality of litterfall, and the mineralization rate of SOC are different, resulting in significant differences in SOC content (Chen et al. 2019; Li et al. 2019). In addition, human disturbance, soil structure, physical and chemical properties, soil microbial communities and other differences affect the formation and change of SOC (Huang et al. 2018; Du et al. 2016; Wang et al. 2017a, b).

Based on the above analysis on the affecting factors of SOC pool in mining subsidence area, And for the sake of data acquisition convenience by GIS and RS, we elected the following affecting factors to predict the SOC spatial distribution in subsidence area: (1) indicators representing surface deformation in subsidence area: elevation, vertical curvature, horizontal curvature, topographic relief, slope of aspect, slope of slope; (2) Indicators representing runoff change in subsidence area: topographic humidity index; (3) Indicators representing land use, terrain slope and vegetation coverage in subsidence area : land use type, slope, aspect and vegetation coverage index.

These spatial factors are used as auxiliary variables in the spatial prediction of SOC (Mueller et al. 2003; Wu et al. 2009; Mishra et al. 2010; Francaviglia et al. 2012), which will help to improve prediction accuracy. Furthermore, with the increasing development of GIS and remote sensing technology, multi-source remote sensing data has showed great advantages in the spatial prediction of soil properties, which is more practical through GIS and remote sensing data (Summers et al. 2011; Sullivan et al. 2005).

Table 1 Terrain factors and the detailed data extraction process

2.3.2 Data acquisition

The digital elevation model (DEM) of the study area with a spatial resolution of 30 m was obtained from the Profession scientific research of public welfare in Ministry of Land and Resources. The calculation of the various environmental factors was based on previous research (Zhang et al. 2010). The Arc GIS spatial analysis tools were used to extract the terrain factors from the DEM data for the study area, including elevation, slope, aspect, vertical curvature, horizontal curvature, the relief degree of land surface, SOS, SOA and the topographic wetness index. The detailed extraction process is shown in Table 1.

Landsat 8 images were obtained from the International Scientific Data Service Platform, Computer Network Information Center, Chinese Academy of Sciences and the NDVI, and land use type (LUTP) were obtained by raster calculation from ENVI 5.1 in the third and fourth bands. The date of the image is July 2015. Based on Arc GIS, a GIS database of the research area was created, which includes the sample information collected in the research area of the sampling point and the ten environmental pieces of information extracted from remote sensing data. The DEM data and calculated NDVI values are shown in Fig. 2 (Dai et al. 2014).

Fig. 2
figure 2

The main environmental factors in the study area

2.4 Spatial prediction model of regional SOC content using RBF neural network

2.4.1 Prediction model of the RBF neural network

The RBF neural network is a three-layer feedforward neural network model with a single hidden layer. The three-layer data layer includes an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer, with a number of neurons in each. Each input neuron is fully connected to all the hidden neurons, and the hidden neurons and output neurons are also connected to each other through a set of weights. It has obvious advantages in learning speed and parameter setting, compared with the widely used BP neural network model (Alp et al. 2005). When using the radial basis neural network to predict SOC content, the larger the spread constant, the smoother the function fitting. However, the large spread means that more neurons are needed to adapt to the rapid changes in the function, which places a lot of pressure on the calculation of the function. However, if the spread is set too small, the designed network performance will be poor (Wallisch et al. 2014). Therefore, different spread values need to be tested in the network design to determine an optimal value (Zhou et al. 2014). Similarly, the more hidden neurons, the smaller the prediction error of the model. However, increasing the hidden nodes of the neural network will increase the amount of computation. The longer it takes for neural network training and testing, the lower the learning rate of the neural network, and the lower the real-time performance of the neural network in the application. Conversely, too many hidden nodes may produce over-fitting results (Pan 2017). Therefore, before performing the radial basis neural network simulation, it is first necessary to debug the extended constant spread of the radial basis function and the maximum number of neurons, MN, of the hidden layer, and then select the spread and MN when the error is the smallest.

In this study, the RBF neural network was used as the tool to input 11 quantitative environmental factor variables as the network input, and then the SOC content at the corresponding point was used as the network output to establish an artificial neural network model that could express the quantitative relationship between the environmental factors and SOC content. The environmental factor enters the network through 11 input neurons, including elevation, slope, aspect, vertical curvature, horizontal curvature, the relief degree of land surface, SOS, SOA, topographic wetness index, LUTP and NDVI. Then the information is transmitted to the hidden neurons through Y = [y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11] T. Each hidden neuron then transforms the input neuron using a transfer function \(\varnothing\).

The functional relationship between each input neuron and the hidden neuron is (Schmitz et al. 2005) :

$$h_{\text{t}} \left( {y_{\text{s}} } \right) = \emptyset \left( { - \frac{{\left\| {Y - c_{\text{t}} } \right\|}}{\sigma }} \right)$$
(1)

where \({h}_{\text{t}}\) is the hidden neuron, \(Y\) is the output neuron, and \(\emptyset \left( {} \right)\) is the transfer function, which in this study is the gaussian radial basis function. || || is the Euclidean norm and \({h}_{\text{t}}\) is the center of the t neuron in the hidden layer, which is the width of the hidden neuron. This can be computed by:

$$\sigma = \frac{{d_{{\max }} }}{{\sqrt {2T} }}$$
(2)

where \({d}_{{{\text{max}}}}\) is the maximum distance between the centers of the hidden neurons and T is the number of hidden neurons.

Finally, the output layer responds to the output of the hidden layer through the mapping function, which is a linear function and a linear combination of the output results of the hidden layer through connecting weights. The formula is:

$$\hat{Z}_{{{\text{ANN}}}} = \mathop \sum \limits_{{t = 1}}^{m} w_{\text{t}} \emptyset _{\text{t}} \left( Y \right)$$
(3)

where \(\hat{Z}_{{{\text{ANN}}}}\) is the estimated value of SOC content, \({w}_{\text{t}}\) is the connecting weight between the hidden neuron and the output neuron, and \({\varnothing }_{\text{t}}\left(Y\right)\) is the response of the tth hidden neuron resulting from all input data.

In MATLAB, the new function is called for the operation of the radial basis function. The call format is:

$${\text{net}} = {\text{newrb}}\left( {{\text{P}},\;{\text{T}},\;{\text{goal}},\;{\text{spread}},\;{\text{MN}},\;{\text{df}}} \right)$$
(4)

where net represents the neural network model that needs to be established; \(\text{P}\) represents the input matrix, which is the matrix Y that contains all the environmental information; T is the output matrix, and the SOC content is predicted using this function. Goal is a scalar, representing the specified mean square error; Spread refers to the expansion speed of the radial basis function; MN represents the maximum number of hidden neurons; and \(\text{d}\text{f}\) represents the number of neurons added between two displays.

2.5 Estimate of residuals by ordinary kriging

The measured value of SOC content was divided into two parts: the sum of the predicted value by the radial basis function and the residual value. The formula is defined as:

$${Z}\left( {x_{i} } \right) = \hat{z}_{{{\text{ANN}}}} \left( {{x}_{{i}} } \right) + {r}\left( {x_{i} } \right)$$
(5)

where \(Z\left({x}_{i}\right)\) represents the measured value of SOC content at point \({x}_{i}\), \(\hat{z}_{{{\text{ANN}}}} \left( {{x}_{{i}} } \right)\) represents the predicted value of SOC content at point \({x}_{i}\) by an artificial neural network, and \(r\left({x}_{i}\right)\) represents the residual value.

Using the above formula, the residual value at each sample point was obtained and the residual value was spatially predicted by the ordinary kriging method to calculate the residual value of the whole region. Finally, the predicted values of SOC content and the spatial predicted values of the residual were raster-added in Arc GIS 10.0 to obtain the predicted values of SOC content for the entire region. The formula is:

$$\hat{z}\left( {{x}_{{i}} } \right) = \hat{z}_{{{\text{ANN}}}} \left( {{x}_{{i}} } \right) + {\hat{r}}_{{{\text{ok}}}} \left( {{x}_{{i}} } \right)$$
(6)

2.6 Evaluation of the accuracy of the interpolation methods

Based on previous research (Dai et al. 2014; Richard et al. 1991), the three errors of ME, MAE and RMSE were selected for accuracy analysis. The formulas for the three indicators are:

$${\text{ME}} = \frac{1}{{n}}\mathop \sum \limits_{{{i} = 1}}^{{n}} \left[ {{\hat{z}}\left( {{x}_{{i}} } \right) - {z}\left( {{x}_{{i}} } \right)} \right]$$
(7)
$${\text{MAE}} = \frac{1}{{n}}\mathop \sum \limits_{{{i} = 1}}^{{n}} \left| {{\hat{z}}\left( {{x}_{{i}} } \right) - {z}\left( {{x}_{{i}} } \right)} \right|$$
(8)
$${\text{RMSE}} = \sqrt {\frac{1}{{n}}\mathop \sum \limits_{{{i} = 1}}^{{n}} \left[ {{\hat{z}}\left( {{x}_{{i}} } \right) - {z}\left( {{x}_{{i}} } \right)} \right]^{2} }$$
(9)

where \(\hat{z}\left( {{x}_{{i}} } \right)\) represents the predicted value at point \({x}_{i}\), \(z\left({x}_{i}\right)\) represents the measured value of SOC content at point \({x}_{i}\), and n represents the number of validation sites. The smaller the values of MAE, ME, and RMSE, the smaller the simulation error of the model, and the higher the accuracy.

3 Results and discussion

3.1 Descriptive statistics of the SOC content

Table 2 Descriptive statistics of the SOC content

The descriptive statistics of the SOC content are shown in Table 2. The SOC content within the 0–20 cm soil layer in the study area ranges from 0.64 to 23.30 g/kg, with an average value of 10.64 g/kg and a coefficient of variation of 0.39, indicating moderate variation. The skewness is 0.13 and kurtosis is 0.14. While the SOC content within the 20–40 cm soil layer in the study area ranges from 0.25 to 19.97 g/kg, with an average value of 9.34 g/kg and a coefficient of variation of 0.43, indicating moderate variation. The skewness is 0.23 and kurtosis is 0.23. Indicating that the data conform to the normal distribution and belong to the positive skewness distribution. Normal distribution is the premise for the kriging interpolation of data (Liu et al. 2015). Therefore, Table 2 further proves that the kriging interpolation of the SOC content and the residual in this study is reasonable and effective.

3.2 Geostatistical analysis on the spatial variability of SOC in the mining subsidence area

Table 3 Semi-variance analysis of the SOC content and residual

According to kriging interpolation theory, C0 is the nugget variance and the mean random error is the variation jointly caused by experimental error, fertilization, crop variation, management level, and other random factors on a small sampling scale (Sreenivas et al. 2016; Chiles et al. 2009). The large nugget variance indicates that processes on a small scale cannot be ignored. In Changhe River Basin, the nugget values (C0) for the SOC content were small (Table 3), which indicates that the spatial variations in the SOC caused by experimental error, fertilization, crop variation, management level, and other random factors on a small sampling scale were minimal at a regional scale.

C represents the structural variance and the mean system attribute or maximum spatial variation of a regional variable, where this variation is caused by the soil parent material, terrain, climate, and other structural factors (Sreenivas et al. 2016; Chiles et al. 2009). The climate in the subsidence area remained unchanged before and after coal mining, so the spatial variability in the SOC content was basically caused by mining subsidence and other structural factors due to coal mining. C + C0 is the sill variance (the stationary value of the semivariance function after the interval increases progressively to a certain degree) and it represents the total variation in the system. C/(C0 + C) represents the degree of spatial correlation (the proportion of spatial variation caused by structural factors in the total system variation). The spatial correlation is poor when the specific value is less than 0.25, moderate when the specific value is between 0.25 and 0.75, and good when the specific value is greater than 0.75.

In the study area, the C/(C0+ C) values of the SOC content (0–20 cm) and 20–40 cm are 0.91 and 0.64, respectively, where C/(C0 + C) values of the SOC content (0–20 cm) were greater than 0.75 (Table 3), which indicates that the spatial correlation in the SOC (0–20 cm) is mainly caused by structural factors such as mining, surface subsidence and other structural factors due to coal mining at a regional scale. This is mainly due to the subsidence of the surface, the destruction of the original topography and surface vegetation, and soil erosion caused by large-scale coal mining. Changes in the physical, chemical and biological properties of soil in mining areas will result in the destruction of soil aggregates, nutrient loss, reduced microbial activity, and decreased SOC content.

3.3 Spatial distribution of SOC content

The spatial distribution of the SOC content obtained by the two methods is shown in Fig. 3. The spatial distribution of the SOC content obtained by the two methods is in general consistent. On the whole, the SOC content is relatively low in certain areas west of the Changhe River and the highest content is concentrated southeast of the river. In the study region, the mining area is mainly located in the western part of the river basin. Large-scale coal mining activities have caused varying degrees of ground subsidence, soil erosion, and vegetation damage. The land in the mining area has been severely damaged, soil fertility has declined and organic carbon has been destroyed. Therefore, the average SOC content in the western of the study area is lower than the eastern, which is more obvious within the 20–40 cm soil layer. This may be due to the fact that SOC is mainly derived from the biomass that enters the soil, and the activities in the mining area destroy the soil structure, reduce the soil quality, and reduce the productivity, so less biomass was input to soil than in the unmined area. Secondly, the biomass of input soil is mainly concentrated in the topsoil and decreased with the increase of soil depth, resulting the SOC content of 20–40 cm in the western region is lower than in the eastern region.

The SOC content in surface soil is relatively high along the long river in the middle of the region (Fig. 3a), which is due to better soil moisture conditions along the river. Previous research has shown a positive correlation between soil moisture content and SOC content, the high influence of soil permeability, and soil moisture content in organic carbon mineralization. Therefore, exogenous organic residues in the water under the action of rot have easily degraded into small molecular organic substances and have been preserved in the soil, which helps improve the SOC content. Compared with the soil along the river banks, the surrounding soil has low water content, good soil permeability, high porosity, and easy mineralization and decomposition of organic carbon, which is not conducive to the accumulation of SOC. Therefore, the SOC content is relatively low.

Fig. 3
figure 3

The spatial distribution of SOC. a 0–20 cm, Direct kriging; b 0–20 cm, RBF Neural Network; c 20–40 cm, Direct kriging; d 20–40 cm, RBF Neural Network

Based on the prediction results, the SOC content ranges in the study area predicted by the RBF neural network are 0.58–23.75 g/kg (within the 0–20 cm soil layer), and 0.55–20.37 g/kg (within the 20–40 cm soil layer), respectively. The SOC content ranges in the study area obtained by the direct kriging method are 1.34–22.13 g/kg (within the 0–20 cm soil layer), and 0.65–19.65 g/kg (within the 20–40 cm soil layer), respectively. It can be concluded that the SOC content within the 0–20 cm soil layer is slightly higher than that within the 20–40 cm soil layer. This is because the surface soil, with its rich hydrothermal resources and animal and plant remains, promotes the decomposition of microorganisms, which is more conducive to the accumulation of organic carbon. Combined with the prediction results, the predicted results of the RBF neural network showed more information than the direct kriging interpolation method, and the changes in the local areas are more obvious. This is because the RBF neural network comprehensively considers different geographical factors, especially the changes in topographic factors caused by coal mining disturbances and the spatial correlation of variables on SOC. In general, the strong spatial dependence of SOC is determined by the intrinsic changes in SOC, while the external variation table controls the variability of the less spatially dependent parameters (Cambardella et al. 1994). Furthermore, because of the influence of coal mining activities in the study area, the terrain change is more severe and had a greater influence on the spatial distribution of SOC content. Therefore, the RBF neural network comprehensively considers the effects of various environmental factors on SOC, taking into account the spatial structure of SOC. At the same time, the calculation of residuals by the ordinary kriging method takes into account the spatial variation of sample point randomness. Compared with the ordinary kriging method, combining kriging with the RBF neural network takes both internal and external factors into consideration to improve the accuracy of SOC spatial distribution prediction.

Comparing the prediction results of the two methods in Fig. 3 shows that the predicted results of the RBF neural network revealed more detail than the ordinary kriging, which only considers the spatial correlation of SOC, which makes the prediction effect unsatisfactory. The range of variability in the SOC content and the residual is large, indicating that the variables are influenced by other factors within a wide range of regions (Table 3) (Hengl et al. 2004; Takata et al. 2007). Therefore, the spatial distribution of the SOC content using direct kriging is not very accurate, while the RBF neural network that combines environmental factors is more consistent with the actual state of the study area and is more scientific.

3.4 Accuracy assessment of the prediction methods

The fit of the predicted and measured values (Fig. 4) shows that the determination coefficient R2 obtained by the RBF neural network are 0.81 and 0.70, respectively, which is much higher than the 0.44 and 0.36 obtained by direct kriging. The determination coefficient R2 indicates the fitting accuracy of the predicted and measured values. The closer R2 is to 1, the higher the fitting accuracy, indicating the better prediction effect of the model (Nakagawa et al. 2013). In conclusion, compared with the ordinary kriging method, the spatial prediction accuracy of the RBF neural network combined with the kriging method for the SOC content in the mining area is higher, and this result has been confirmed by previous studies.

Fig. 4
figure 4

The scatter plot of predicted and measured values

Table 4 Prediction accuracy indicators for the two methods

The prediction accuracy indicators of the two methods are shown in Table 4. In terms of prediction accuracy, the ME, MAE and RMSE obtained by direct kriging are 0.12, 0.89, 1.02 (within the 0–20 cm soil layer), and 0.89, 1.45, 1.89 (within the 20–40 cm soil layer), respectively, which are all higher than the 0.03, 0.51, 0.59 (within the 0–20 cm soil layer), and 0.58, 0.76, 1.27 (within the 20–40 cm soil layer) obtained by the RBF neural network (Table 4). Among them, ME represents the average deviation of the prediction, indicating that the average level of the RBF neural network prediction is higher. MAE represents the actual prediction error, indicating that the prediction of the RBF neural network is more consistent with the actual SOC spatial distribution. Furthermore, the RMSE values from both methods are larger than MAE, indicating that the error has strong spatial variability (Dai et al. 2014). The prediction accuracy of the RBF neural network is higher for the spatial distribution of SOC in the study area (Table 4).

4 Conclusions

In this paper, the SOC content of a mining area was analyzed. We also propose a new method to predict the spatial distribution of SOC content in the mining area using the RBF neural network method combined with the ordinary kriging method. First, the RBF neural network is used to construct the nonlinear mapping relationship between the environmental variables of the mining area and the SOC content to calculate the predicted value of SOC content.

Then, the residuals are calculated and spatialized with the ordinary kriging method. Finally, the spatial residuals are added to the results of the RBF neural network, and the predicted value of SOC space in the study area is obtained.

Compared with the ordinary kriging method, this method is scientific and feasible. The conclusions of this study are as follows:

  1. (1)

    Coal mining activities have caused great disturbance to the soil in the mining area that has reduced the SOC content in the area, resulting in the loss of soil nutrients. As coal mining will cause land subsidence, soil erosion and vegetation destruction, the main influencing factors of soil property distribution in the mining area are topography and vegetation factors, such as slope, elevation, topographic wetness index and the normalized difference vegetation index. The most important variable for prediction of SOC is slope. Therefore, when estimating the spatial distribution of SOC in a mining area, these topographical and vegetation factors should be taken into account to improve estimation accuracy.

  2. (2)

    Compared with direct kriging interpolation, the RBF neural network combined with kriging has a smaller average error, mean absolute error and root mean square error than ordinary kriging in terms of prediction accuracy. Furthermore, the accuracy R2 of the predicted point and the measured point is higher than that of the ordinary kriging. Therefore, the radial basis neural network combined with environ- mental factors is more suitable for predicting SOC content in mining areas that have been severely disturbed by human activities. As a result, it can provide a reference for spatial prediction of soil properties in a mining area, and thus provide a scientific and reasonable basis for land reclamation and land resource management.