Introduction

Forest biomass is a basic measure for evaluating the forest ecosystem, and it is also an essential variable for quantifying the structure and function of the ecosystem (Paulo et al., 2012; Rodrguez-Veig et al. 2019). As an important part of the carbon cycle, effective forest biomass monitoring can help us understand the interactions between the biosphere and the atmosphere (Pang et al., 2017; Rödig et al., 2017; Zhang et al., 2019). Deciduous broad-leaved forest is one of the most widely distributed forest vegetation types in the world, and it plays an important role in regulating climate, as well as maintaining water and soil (Souza & Longhi, 2019). Recently, with increasing and changing climate, deciduous broad-leaved forests are facing unprecedented threats (Laurin et al., 2020; Pope et al., 2020). The effects of climate change on rangelands and broad-leaved forests were studied using free satellite data from the GEE platform in a recent research project (Orusa & Mondino, 2021). The use of remote sensing to estimate deciduous broad-leaved forest biomass plays an important role in the study of forest ecosystems and their contribution to the global carbon cycle.

Traditional biomass calculation methods have the defects of large workload and high costs, such as the clear-cutting method (Liu et al., 2020) and the standard wood method (Jiang et al., 2017). In addition, the regression method is also commonly used (Li et al., 2012; Zaki et al., 2018; Zhang et al., 2020). Therefore, it is challenging to meet the requirements of these methods for estimating forest biomass at large-scale (Han et al., 2019; Koju et al., 2019; Rodig et al., 2017; Wan et al., 2018). Remote sensing technology has the advantages of the wide detection range and short update time, so combining remote sensing data with a small sample set of ground survey data has become a useful approach to estimate forest biomass at large-scale (Gwenzi et al., 2017; Kankare et al., 2013). In temperate and subtropical regions, deciduous forest is the most typical forest type, and the study of deciduous forest biomass change has important implications for climate change (Ghosh & Behera, 2018; Landuyt et al., 2020; Raha et al., 2020). In terms of research data, the biomass estimation of deciduous forest was carried out mainly by optical remote sensing data and lidar data (Joshi & Dhyani, 2019; Kristen et al., 2018; Wang et al., 2020). In terms of research methods, the relationship between remote sensing information and measured biomass is established mostly by multivariate regression analysis, back-propagating neural network and other methods. The estimated biomass by remote sensing is mostly aboveground biomass (Kaba & Abunyewa, 2021; Raj & Jhariya, 2021). Balbinot et al. (2017) analyzed the vertical distribution of aboveground biomass in a seasonal deciduous forest in Rio Grande do Sul state, Brazil. Their results showed that averaged dry aboveground biomass was 316.5 Mg·ha−1, and trees with diameter at breast height greater than 10 cm accounted for over 89% of the biomass. Ni et al. (2019) proposed a method to extract forest canopy height through the synthesis of UAV stereo imagery of leaf-on and leaf-off, further demonstrating that the extracted forest canopy height could be used for the inventory of deciduous forest aboveground biomass. Their results showed that forest aboveground biomass maps from UAV stereo imagery were highly correlated with those from lidar data with R2 ≥ 0.94 and RMSE ≤ 10.0 Mg·ha−1. Ningthoujam et al. (2018) presented a regression-based woody biomass estimation for tropical deciduous mixed forest dominated by Shorea robusta using ALOS PALSAR images and field data at the lower Himalayan belt of Northern India. Many studies show vegetation indices, texture factors, and topographical variables are important variables used in remote sensing to estimate forest biomass (Hojo et al., 2020; Nandy et al., 2017; Senger et al., 2020). Environmental variables (e.g., rainfall, humidity and soil) can affect the horizontal distribution of species biomass (Fu et al., 2019). Additionally, some forest parameters, such as stand age, leaf area index and canopy closure, can also improve the accuracy of biomass estimation (Li et al., 2020a, 2020b, 2020c; Peng et al., 2019; Zhang et al., 2020).

In the era of big data, machine learning algorithms begin to show their advantages in prediction model construction and the important assessment of characteristic variables (Abid, 2021; Li et al., 2020a, 2020b, 2020c). Some of their algorithms do not require the sample data to satisfy a particular distribution and are able to solve high dimensional variables (Gumma et al., 2020). Compared with the traditional empirical model and physical model, the machine learning regression model has a better ability to parse data and there is no pathological inversion problem (Kumar et al., 2021). As an important driving force of the development of artificial intelligence, machine learning has been widely used in many fields, including ecology and remote sensing (Júnior et al., 2020). Using machine learning and remote sensing data to estimate biomass in a given area has become an effective tool (Lakyda et al., 2019). Orusa et al. (2023) developed an algorithm for phenological indicator mapping using Landsat and Sentinel-2 data on the Google Earth Engine (GEE) platform. Research results show that the accuracy of back propagation-artificial neural network (BP-ANN), random forest (RF), support vector machine (SVM), and k-nearest neighbor (k-NN) models is higher than that of traditional multi-factor regression models (López-Serrano et al., 2016; Mutanga et al., 2012; Nguyen et al., 2018; Yang et al., 2018). Compared with linear regression model, machine learning can improve model accuracy when the biomass is more than 120 Mg·ha−1 (Gao et al., 2018). Most studies on aboveground biomass primarily focus on the coniferous forests, coniferous and broad-leaved mixed forests, and evergreen broad-leaved forests (Dai et al., 2016; Dimitrov & Roumenina, 2013; Hu et al., 2016; Luo et al., 2021; Nie et al., 2017; Shen et al., 2018; Stovall et al., 2017). However, there is limited research on combining optical remote sensing information with machine learning to estimate the biomass of natural deciduous broad-leaved forests.

This study focuses on the development of quantitative models for biomass in the natural deciduous broad-leaved forest of Mazongling Nature Reserve in China. Vegetation indices and texture information were extracted using Worldview-2 remote sensing data. Additionally, terrain factors extracted from DEM (Digital Elevation Model) and ground measured data were obtained. An optimal biomass remote sensing quantitative inversion model was constructed using a machine learning algorithm. This study estimated the biomass of forest and analyzed its distribution. Its results provide a scientific reference for the protection and utilization of forest resources in Mazongling Nature Reserve.

Materials and Methods

Overview of the Study Area

Mazongling Nature Reserve is located in the southwest of Jinzhai County, Anhui Province, China (115°31′-115°50′E, 31°10′-31°20′N; Fig. 1). It is one part of Anhui Tianma National Nature Reserve, with a total area of 4640.85 ha. The reserve belongs to the north subtropical humid monsoon climate zone, and it protects north subtropical evergreen-deciduous broad-leaved mixed forest as well as rare wild animals and plants. Tree species occurring on Mazongling Nature Reserve include Cunninghamia lanceolata (Lamb.) Hook., Pinus taiwanensis Hmyata, Quercus serrata var. brevipetiolata (A.DC.) Nakai, Castanea seguinii Dode, Cyclobalanopsis glauca (Thunb.) Oerst, and shrubs include Loropetalum chinense (R. Br.) Oliv., Rhododendron simsii Planch., Rhus chinensis Mill. The highest elevation in the reserve is 1671 m, the valley is vertical and horizontal, and the natural vegetation is lush. Its annual average temperature is 13.3 °C, and the average temperature in summer is 20 °C. The annual sunshine hours are 2225.5 h. Rainfall is abundant in the reserve, and the annual rainfall is 1480 mm.

Fig. 1
figure 1

Location of the study area

Research Data

Sample Plot Data

The sampling survey was conducted from July 23 to 31, 2019. To comprehensively investigate the forest resources in the study area, stratified and typical sampling methods were used to establish 35 deciduous broad-leaved forest plots of different ages and site conditions. The sample plots were 20 m × 20 m. All the living trees in the plots with a diameter at breast height greater than 5 cm were measured, and tree heights were measured using a laser range finder. Differential GPS (DGPS) was used to determine the locations of sample plots. The dominant species in the study area were found to be hardwood tree species, so the forest biomass of sample plots were calculated using the general calculation method of hardwood biomass proposed by Li and Lei (2010). Based on the 6th and 7th Chinese National Forest Inventory data, Li and Lei proposed a calculation model for hardwood tree species after comparing three estimation methods (i.e., the Intergovernmental Panel on Climate Change method, the Continuous Biomass Expansion Factor method, and the Empirical (Regression) Model Estimation method). The model has been widely used in China due to its high accuracy and good applicability. Its specific formula is

$$W = 0.044\left( {D^{2} H} \right)^{0.9169} + 0.023\left( {D^{2} H} \right)^{0.7115} + 0.0104\left( {D^{2} H} \right)^{0.9994} + 0.0188\left( {D^{2} H} \right)^{0.8024} ,$$
(1)

where W (Mg·ha−1) is the forest biomass, D (cm) is the breast diameter, and H (m) is the tree height. The estimated biomass via Eq. (1), as well as the locations of 35 sample plots (Fig. 1, Table 1), were used to establish a forest biomass model by machine learning.

Table 1 Biomass statistics of sample plots

Remote Sensing Data

Worldview-2 satellite images from June 23, 2019 were used as the remote sensing data. The spatial resolution of panchromatic and multispectral images was 0.46 m and 1.85 m, respectively. Their band information is shown in Table 2. A radiation correction was conducted using ENVI5.3 software to obtain radiance data. The MODTRAN4 + radiative transfer model was used for atmospheric correction of radiance data and to obtain reflectivity data. Gram-Schimdt transform was used to fuse panchromatic images and multispectral data to obtain true color high-resolution images. 1:10,000 topographic maps were used to conduct geometric corrections for the remote sensing data, and their RMSEs were kept within 1 pixel.

Table 2 Multispectral information for WorldView-2 remote sensing

Remote Sensing Classification of Forest Types

According to the size of sample plots, the characteristics of forest resources in the study area, and field investigation results, forest resources were categorized into four types: deciduous broad-leaved forest, coniferous forest, coniferous and broad-leaved mixed forest, and non-forest land. After the preprocessing of WorldView-2 data, RF, maximum likelihood method and Mahalanobis distance method were selected in ENVI5.3 to classify forest types. Verification data and the Kappa coefficient were used to test the classification accuracy. After classification, the majority/minority processing was conducted to classify broken patches from the original classification results into the category of background.

Feature Selection

The coordinate of the center point of each sample plot was chosen to be the center pixel. The average pixel value in a window size of 20 × 20 acted as the remote sensing feature. Vegetation index and gray-level co-occurrence matrix (GLCM) texture information were extracted. The window size of the GLCM texture information was defined as 9 × 9 after comparing different sizes, using the default 0° direction and a pixel statistical interval. Terrain factors, such as slope, aspect and elevation, were extracted from Digital Elevation Model data at a resolution of 12.5 m using the ArcGIS10.2 platform. 36 candidate factors were selected. They are NDVI, RVI, EVI, DVI, SAVI, MSAVI, B532_entropy, B3_entropy, B4_entropy, B5_entropy, B532_secondary moment, B3_secondary moment, B4_secondary moment, B5_secondary moment, B532_dissimilarity, B3_dissimilarity, B4_dissimilarity, B5_dissimilarity, B532_mean, B3_mean, B4_mean, B5_mean, B532_homogeneity, B3_homogeneity, B4_homogeneity, B5_homogeneity, B532_correlation, B5_correlation, B532_contrast, B3_contrast, B5_contrast, B532_variance, B3_variance, B4_variance, B5_variance, and Slope. The types and detailed descriptions of the modeling factors are shown in Table 3.

Table 3 Biomass modelling factors of natural deciduous broad-leaved forest in Mazongling Nature Reserve

Model Variable Selection

Boruta and Recursive Feature Elimination (RFE) algorithms in R language were used to select variable sets related to the dependent variable. Boruta algorithm is based on the same idea of a random forest classifier. It adds randomness to the system and collects results from an ensemble of randomized samples and to assess the importance of each feature. This iterative process can reduce the misleading impact of random fluctuations and correlations (Amiri et al., 2019). RFE algorithm trains a model on a training set using all predictors. It calculates each variable importance and ranks them in order to seek an optimal variable set model. RFE seeks to improve generalization performance by removing the least important features whose deletion will have the least effect on training errors (Hayet et al., 2020). As the variables used by the Boruta algorithm could be highly correlated, we removed the highly correlated variables using the Pearson correlation coefficient. We set the threshold of the correlation coefficient to 0.9 to ensure that the absolute value of the correlation coefficient of all the prediction variables was below 0.9. This procedure could reduce the excessive abandonment of prediction variables due to the collinearity between prediction variables. Finally, b3_mean, b3_secondary moment, b3_variance, b4_secondary moment, b5_mean, slope, and NDVI were selected as predictors.

Machine Learning Algorithm

We used the k-NN, ANN, and RF machine learning algorithms in the platform of RStudio to construct a forest biomass model.

k-Nearest Neighbour (k-NN) Method

k-NN algorithm is a typical non-parametric algorithm, which estimates biomass based on the observation data of neighboring sampling points (Hoef & Temesgen, 2013). The basic principle of k-NN is that it finds k points, which are the k-nearest neighbors closest to the spatial distance from the prediction variable space of the training set, and it takes the average value of the k-nearest neighbor response variables to predict the value of the object (Mcroberts et al., 2016). Euclidean distance, a linear distance between two observations,\(d_{{(x_{a} ,x_{b} )}}\) is a common distance measure for constructing a forest biomass model based on k-NN. The formula is defined in Eq. (2).

$$d_{{\left( {x_{a} ,x_{b} } \right)}} = \sqrt {\sum\limits_{i - 1}^{P} {\left( {x_{ai} ,x_{bi} } \right)}^{2} } ,$$
(2)

where \(x_{a}\) and \(x_{b}\) are two sample points, and \(P\) is the dimension of each sample.

k-NN method is flexible and transparent, and it has strong generalization ability. However, when there are many features, many feature combinations will be generated, thus reducing the prediction efficiency and model accuracy. Therefore, the super parameter ‘k’, which means the k points closest to the target in the spatial distance, needs to be set when modelling in R language. If k is too small, then the modelling with training data is too sensitive, and the stability of the model is poor. If k is too large, the range of average value becomes too large, and the prediction error is large (Kumar et al., 2021). In practice, k ranges from 3 to 10.

Artificial Neural-Network (ANN) Method

ANN is a multi-layer feed-forward neural network with information forward propagation and error backward propagation (Fig. 2). Firstly, information is processed layer by layer from input layer to hidden layer, and outputs are compared with expected outputs. Reverse propagation is performed when the error between model outputs and expected outputs is greater than a predetermined value. Then, the internal weights and thresholds of the network are adjusted according to the prediction error, and the network is transferred to forward propagation again. This process is repeated until the error reaches the predetermined value, so that the outputs and the predictions are close enough to each other (Dong et al., 2020; Mao et al., 2019).

Fig. 2
figure 2

Artificial neural network

Decay’ and ‘size’ parameters are required when using the ‘nnet’ package of R language to build an ANN model. The parameter of “decay” is used as a penalty for the sum of squares of the weights. The use of “decay” can both help the optimization process and avoid over-fitting (Raji et al., 2020). ‘Decay’ was set as 0.001, 0.01, and 0.1 to reduce the possibility of over-training. ‘Size’ is defined as

$${\text{size}} = \sqrt {P + O} + m,$$
(3)

where ‘size’ is the number of hidden units, P is the number of nodes in the input layer, O is the number of nodes in the output layer, and m is an integer constant between 0 and 10.

Random Forest (RF) Method

RF is a classifier that contains multiple decision trees, and it uses multiple decision-tree algorithms to carry out repeated predictions for the same inputs (Dong et al., 2020). Multiple random samples can be obtained to establish the corresponding decision trees through several rounds of bootstrap sampling. In this way, a random forest is formed.

The regression procedure of RF is achieved by using the ‘random forest’ data package in R software. Two key parameters are involved in this process: ntree and mtry. ‘Ntree’ is the number of decision trees, which is also the number of times that bootstrap is used to re-sample. ‘Mtry’ is the number of stochastic characteristics, which is also the number of input variables and usually one-third of the number of decision trees. However, ‘mtry’ needs to be tuned to achieve an optimal value (Tavares Júnior et al. 2020).

Model Accuracy Assessment

Model accuracy can be verified using leave-one-out cross-validation. That is to say, for N samples data, each available sample is taken as a test set, and the remaining N-1 samples are used as a training set. This procedure repeats N times, then N classifiers can be obtained, and the average on the results from N times is taken as the final performance index. This method uses almost all the samples to train the model, and the evaluation results are more reliable. There is no randomness and the entire process was repeatable (Wolfrum et al., 2020). The coefficient of determination (R2; Eq. (4)) and root mean square error (RMSE; Eq. (5)) were used to evaluate the models. Generally, greater R2 and lower RMSE indicate a better model fit.

$$R^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} (x_{i} - \overline{x})\left( {y_{i} - \overline{y}} \right)}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{N} (x_{i} - \overline{x})^{2} \mathop \sum \nolimits_{i = 1}^{N} (y_{i} - \overline{y})^{2} } }},$$
(4)
$$RMSE = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{N} (y_{i} - x_{i} )^{2} }}{N}} ,$$
(5)

where \(x_{i}\) is the measured value of the i-th sample plot, \(y_{i}\) is the model estimated value of the i-th sample plot, \(N\) represents the number of sample plot, \(\overline{x}\) represents the average value of the measured values, and \(\overline{y}\) is the average of the estimated values.

Results

Forest Type Classification in Mazongling Nature Reserve

The Kappa coefficients for the remote sensing classification of forest types using RF, maximum likelihood, and Mahalanobis distance methods were 0.97, 0.92, and 0.80, respectively. We selected the RF method with the greatest Kappa coefficient to classify the forest types (Fig. 3). Deciduous broad-leaved forest covered 2275.97 ha (49.04%), coniferous forest covered 1163.71 ha (25.08%), coniferous and broad-leaved mixed forest covered 735.38 ha (15.85%), and non-forest covered 465.78 ha (10.04%) of the total study area. Among these four different forest types, deciduous broad-leaved forest was primarily distributed in the Lingtou zone.

Fig. 3
figure 3

Classification of forest types using the random forest method

Construction of Remote Sensing Quantitative Model of Forest Biomass

The greatest R2 and the smallest RMSE from three models were determined by using leave-one-out cross-validation. The results are shown in Table 4 and Fig. 4.

  1. 1.

    For the RF model, the maximum of RMSE was 36.83 Mg·ha−1 when the mtry was set as 1, and the minimum values was 32.27 Mg·ha−1 when the mtry was set as 7. The model precision was the highest when mtry was 7, R2 and RMSE were 0.68 and 31.85 Mg·ha−1, respectively.

  2. 2.

    For the k-NN model, the maximum of RMSE was 46.11 Mg·ha−1 when k was 9, and the minimum values of RMSE was 40.74 when k was 5. RMSE gradually increased as k increased, and the model was the most accurate when k was 5, R2 and RMSE were 0.48, and 40.74 Mg·ha−1, respectively.

  3. 3.

    For the ANN model, three different values of decay (0.001, 0.01, and 0.1) and hidden layers with sizes of 2 to 12 hide units were compared. The model was found to be the most accurate when decay = 0.1 and size = 2, R2 and RMSE were 0.69, and 31.53 Mg·ha−1, respectively.

Table 4 Comparison of results between three machine learning models in this study and other researches
Fig. 4
figure 4

Root mean square error

Therefore, the most accurate ANN model was selected to construct the remote sensing quantitative estimation model of natural deciduous broad-leaved forest biomass in Mazongling Nature Reserve.

Spatial Distribution of Deciduous Broad-Leaved Forest Biomass in Mazongling Nature Reserve

The verification results of the optimal regression model using leave-one-out cross-validation are shown in Fig. 5. This ANN model had the most accurate prediction (R2 = 0.69, RMSE = 31.53 Mg·ha−1). Therefore, with this optimal ANN model, the above-ground-biomass (AGB) of natural deciduous broad-leaved forest was estimated using WorldView-2 images for Mazongling Nature Reserve (Fig. 6). The estimated biomass from this model was 90.34 ± 47.96 Mg·ha−1. The AGB of natural deciduous broad-leaved forest in Mazongling Nature Reserve was primarily distributed in Lingtou and Heshangping zones, followed by Dacaoping and Dongshan zones. The lowest AGB (48 Mg·ha−1) was located in Qianping Village zone.

Fig. 5
figure 5

Scatter diagram of correlation between model predicted values and measured values of forest biomass in Mazongling

Fig. 6
figure 6

Spatial distribution of broad-leaved deciduous forest biomass in Mazongling Nature Reserve

Discussion

Due to the complex vegetation and numerous tree species in sample plots, we did not use standard wood method. The biomass in the sample plots was calculated using the general calculation method of hardwood biomass proposed by Li and Lei based on the 6th and 7th Chinese National Forest Inventory data (Fu et al., 2022; Huang et al., 2022; Ju et al., 2022). The calculation of biomass of different tree organs is mainly based on the two parameters of tree height and DBH, and the R2 of height curves of other hard broad trees reaches 0.95. The number of sample plots should be increased in the future research that includes all age-class. Model accuracy can be improved by using the biomass model of the same zone, same family or same genus.

The WorldView-2 remote sensing image in this study was acquired in June 2019. Vegetation in the study area is in the growing season and is relatively lush. Because of the problems of different objects having the same spectrum and the same objects having different spectrum in the image (Ashutosh & Roy, 2021), there are omissions and mistakes when carrying out classifications, although its Kappa coefficient is very high. For example, the division of between "coniferous forest" and "coniferous and broad-leaved mixed forest", and that of between "broad-leaved forest" and "coniferous and broad-leaved mixed forest". The WorldView-2 remote sensing images used in this study had a small amount of cloud cover, which slightly impacted the classification of forest types and the inversion of forest biomass. However, it only accounted for 4.8% of the total area in the study area, which met the cloud content requirement (< 10%) for analyzing remote sensing images. Thus, the WorldView-2 images were not de-clouded so as to avoid detailed damaging after the de-clouding process. Regional image replacement can reduce the influence of cloud cover and improve image utilization. However, this study was based on only one year of remote sensing data (i.e., 2019). Therefore, further research on the spatial changes of forest biomass is necessary to improve the accuracy of model estimation.

The predictors selected in this study were able to construct a remote sensing quantitative model of deciduous broad-leaved in Mazongling Nature Reserve. However, the collinearity among predictors was insensitive, and the linear correlation between forest biomass and factors was not high. Additionally, there were positive and negative correlations between biomass and predictors. Therefore, it was not suitable to use a linear model to capture the relationship between biomass and remote sensing factors as well as geographic factors. However, an ANN model with strong nonlinear fitting ability was more suitable to decipher the relationship.

The results showed that the accuracy of the ANN model was the highest with R2 = 0.69. It is lower than that of the multiple linear regression biomass model (Wei, 2019). It is necessary to compare the multiple linear regression models with the machine learning model, so as to fully study the differences between the models and provide a sufficient basis for selecting a more accurate inversion model. Therefore, this study analyzed many references of machine learning algorithms for estimating forest biomass, especially for broad-leaved forest. The results showed that the difference of R2 and RMSE were a bit large. On the one hand, the biomass caused by normal growth is different due to different site conditions (soil, climate, terrain, etc.) of forest type. On the other hand, the difference of modeling candidate factors also plays an important role in model construction. In addition, it was less accurate than Antonio Montagnoli's model using lidar in the Alps (Montagnoli et al., 2015). This could be due to the light saturation in the WorldView-2 remote sensing images. The vegetation density of the deciduous broad-leaved forest in Mazongling Nature Reserve was so high that the electromagnetic radiation information received by remote sensing could no longer reflect changes in biomass. It led to inaccurate estimations for areas with high biomass, causing light saturation of biomass. As a result, the vegetation index and texture factor data fluctuated slightly in some areas, affecting model accuracy and biomass inversion. Therefore, further research is to determine the saturation point of remote sensing and improve the accuracy of remote sensing estimation of forest biomass. This study mainly focuses on the biomass modeling of deciduous broad-leaved forest. Biomass remote sensing inversion model of Pinus forest, Taxodium forest, coniferous and broad-leaved mixed forest, and mixed forest should have been constructed separately, which can help discuss and compare the consistency and difference between the mixed inversion model and the single forest type biomass model (Raj & Jhariya, 2021; Wang et al., 2020).

The average biomass of deciduous broad-leaved forest in Mazongling Nature Reserve was 147.68 Mg·ha−1. It is observed that Mazongling Nature Reserve had no soil erosion and was with good site quality, indicating that its management was helpful to improve the biomass. However, the average biomass was slightly lower than that of Maoershan (153.63 Mg·ha−1) in the temperate zone (Liu et al., 2016). The minimum measured DBH for each living standing tree was 2 cm in Maoershan sample plots, while that of Mazongling Nature Reserve was 5 cm. Therefore, the minimum measured DBH for each living standing tree could also affect the estimation of aboveground biomass. Overall, the biomass of deciduous broad-leaved forest in Mazongling Nature Reserve was higher, indicating that the protection measures for forest resources were effective. The study area is a nature reserve with effective forest protection, where 50% of the fixed plots consist of near mature and mature forests. The average DBH and tree height were 18.8 cm and 11.57 m, respectively. Furthermore, recent forest resource statistics indicated that nearly 70% of the deciduous broadleaf forest in the entire study area was near mature or mature forest. The AGB of Jiangxi Province is 141.21 t/hm2, while China's subtropical forest has an AGB of 149.30 t/hm2 (Liao et al., 2018; Ma et al., 2019). In Taizi Forest Farm, Hubei Province, located at the northern edge of the subtropical zone, the AGB of broad-leaved forest reaches 158.60 t/hm2 (Jian et al., 2021). Although some studies did not distinguish forest types, on the whole, the above-ground biomass per unit area of forest was higher than the results in this paper. Therefore, the mean AGB value obtained in this paper (90.34 ± 47.96 Mg·ha−1) was relatively low. There are few studies on biomass inversion of single deciduous broad-leaved forest based on remote sensing, thus this study can fill in this gap. The research methods in this study can also provide a reference for biomass research in other geographic regions.

Conclusions

The forest types in Mazongling Nature Reserve were classified in this study using WorldView-2 remote sensing data and three supervised classifiers: RF, maximum likelihood, and the Mahalanobis distance method. Their Kappa coefficients were 0.97, 0.92, and 0.80, respectively. Remote sensing classification results of the RF method showed that the area of deciduous broad-leaved forest was 2275.97 ha, which accounted for 49.04% of the study area. Vegetation index and texture information extracted from remote sensing images were combined with terrain factors to provide 36 candidate variables. This study used the Boruta algorithm, RFE algorithm, and Pearson correlation coefficient to select suitable modeling factors. These factors were b3_mean, b3_secondary moment, b3_variance, b4_secondary moment, b5_mean, slope, and NDVI. The forest biomasses of 35 sample plots were calculated using the general calculation method of hardwood biomass proposed by Li and Lei. The biomass value from 31.69 Mg·ha−1 to 239.36 Mg·ha−1. Three machine learning algorithms (RF, k-NN, and ANN) were used to construct the biomass optimal inversion model of deciduous broad-leaved forest for Mazongling Nature Reserve. R2 of the three algorithms were 0.68, 0.48, and 0.69, respectively. RMSE were 32.27 Mg·ha−1, 40.74 Mg·ha−1, and 31.53 Mg·ha−1, respectively. The estimated average biomass using the ANN model was 90.34 ± 47.96 Mg·ha−1. Results showed that the highest forest biomass distribution area was the Lingtou zone with 191 Mg·ha−1. Mazongling Nature Reserve was founded in 1958 and became a national nature reserve in 1998. As the core area of Mazongling Nature Reserve, Lingtou zone seldom had human disturbance, so the vegetation there was well protected. The lowest forest biomass value (48 Mg·ha−1) was for Qianping Village zone, which mainly comprises residential areas, with relatively flat terrain, small slope, low elevation, and a large number of cultivated land and buildings, where human activities and production and management activities are frequent. The machine learning algorithms used in this study could obtain higher accuracy and better generalization ability than the traditional linear model for biomass estimation. The research methods and results filled the blank of biomass estimation of deciduous broad-leaved forest in this study area. Furthermore, it lays a foundation for further research on dynamic changes of forest biomass, driving force analysis for biomass change, and forest carbon sequestration.