Introduction

Forests are a vital component of our ecosystems, playing a crucial role in maintaining biodiversity, regulating climate, and providing resources essential for human sustenance (FAO 2020). As such, forest plantations, strategically managed for sustainable timber production, can contribute significantly to meeting the growing demand and sustainable use of wood products. Nowadays, in the management of forest plantations many practices are under review in order to meet more sustainable management goals, at local and global level (McEwan et al. 2020). Among these, the management of logging residues becomes paramount, influencing among other things both ecological integrity and economic viability.

Logging residues, i.e. the byproducts of timber harvesting, constitute branches, bark, foliage and other biomass materials left behind after the extraction of commercially utilizable timber (Harmon and Sexton 1996; Titus et al. 2021). Gaining a wider understanding of the role and the dynamics of these residues is pivotal, as they influence Carbon, Nitrogen (Achat et al. 2015; James et al. 2021) and nutrient cycling (Janowiak and Webster 2010), soil physical structure (Trindade et al. 2021), biodiversity (Law and Kolb 2007; Fritts et al. 2017, Grodsky et al. 2019) and overall ecosystem health (Bose et al. 2023). The sophisticated array of options to consider involving the presence of residues and the possible dangers or risks for the forest ecosystem, e.g., forest fire and pests outbreaks among the others, demands thoughtful consideration in optimizing forest management practices.

The management of logging residues presents challenges that intersect the environmental and economic dimensions. Reaching a balance between these competing interests requires innovative solutions that address concerns related to soil degradation, carbon sequestration, and cost-effective management. In addressing these challenges, opportunities arise for advancements in technology and data-driven approaches.

In recent years, technological innovations have revolutionized forestry practices. Unmanned aerial vehicles (UAVs), commonly known as drones, have emerged as indispensable tools in the forestry sector. These aerial platforms offer a versatile means of data collection, enabling efficient monitoring of forested landscapes (Buchelt et al. 2024). Drones have found application in various facets of forest research, ranging from mapping, and monitoring to assessing tree health and biodiversity. Their ability to cover larger areas with speed and precision has redefined the possibilities for data collection. According to the analysis of Buchelt et al. (2024), drones are still not largely used in the realm of forest operations and harvesting, although they could contribute to our understanding of logging residue distribution (Udali et al. 2022), offering insights that were once challenging to obtain through conventional means. Moreover, the integration of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) into drone technology to process large datasets of stratified information marks a paradigm shift in data analysis (Ongsulee 2018). The synergy between drone capabilities and artificial intelligence opens new avenues for comprehensive and nuanced forest management.

This integration has already been experimented on in recent years within the realm of forest operations, achieving positive results. Windrim et al. (2019) used a Convolutional Neural Networks (CNNs) based technique to detect individual logs and stumps and to segment image surface between woody material and background. Their application did efficiently detect woody debris and stumps with an accuracy of 84% overall. Puliti et al. (2018) performed stump detection in photogrammetry collected from UAVs using an iterative region-growing approach, and their results achieved an accuracy of 68–80%. Đuka et al. (2023) compared butt-log volume estimation from field measurements and four photogrammetric approaches with the ultimate goal of optimizing timber transport and estimated log dimensions with high level of correlation (R2 ≥ 0.88) and fairly good volume relative bias (< 10%).

The aim of this study is to evaluate the quantification and distribution of logging residues in clear-felled plantation stands. In particular, the objectives are (i) to assess the distribution performed using a ML classification model feed with drone-based images, and (ii) to compare the volume estimated from the classified images with the volume obtained from in-field observation.

Materials and methods

Study sites

The study sites are all industrial forest plantations located in the KwaZulu-Natal province in South Africa (Fig. 1). The province possesses a varied yet verdant climate thanks to diverse, complex topography. The hinterland where the sites are located is characterized by a dry-winter humid subtropical climate, whereas summers are warm and occasionally hot, with frequent rainfall. Winters are dry with high diurnal temperature variation, with possible light air frosts (South African Weather Service).

Fig. 1
figure 1

Location of the study sites in South Africa. From the top right clockwise, Eucalypt sites E1, E2 and E3 followed by Pine sites P1 and P2. Background image from Google Satellite

The species planted on the sites are a hybrid clone of Eucalypt (Eucalyptus grandis W.Hill x Eucalyptus nitens (H. Deane & Maiden) Maiden), E. dunnii Maiden, E. smithii (R.T. Baker.) and Mexican weeping pine (Pinus patula Schiede ex Schltdl. & Cham.), hereafter identified as Eucalypt and Pine, respectively. The harvesting and extraction operations were performed in all sites in the year 2022, where the timber was either manually or mechanically felled and then transported by forwarder to depot. On the sites mechanically felled, the machine adopted was a Tigercat LH822D equipped with Log Max 7000XT head working with cut-to-length (CTL) technique, whereas the forwarder adopted was a Tigercat 1075C. Table 1 summarizes the main data related to the selected study sites.

Table 1 Summary of data related to the study sites selected

Field sampling, target material and volume estimation

Field sampling of logging residues was performed by adopting a line intersect sampling (LIS) method, which estimates volumes of downed woody material (Brown 1974; Woodall and Monleon 2008) over completely clear-felled areas. The main assumptions when using this sampling technique are the random orientation of the woody debris under the linear transect, lying horizontally, having a circular shape, and a normal distribution within diameter classes. To be able to cover each site, as “rule of thumb”, a minimum ratio of 1.5 plot ha−1 was adopted to ensure enough coverage and representation of the area.

For this study, the sampling method (Fig. 2a) was initially adapted from Rizzolo (2016), registering the diameter of each piece of woody debris under a 20 m line and classifying them in time lag classes (Table 8). The plot, composed of three sampling lines disposed as a perfect triangle (Ross and du Toit 2004), has random orientation, with each triangle at least 25 m from other triangles and from the roadside. The time lag division of woody material refers to the time required for a fuel particle to change its moisture content accordingly to the equilibrium moisture content but can be easily adapted to other applications: material finer than 76 mm in diameter belongs to fine woody debris (FWD) and larger material to coarse woody debris (CWD). The length was also recorded for CWD larger than 203 mm.

Fig. 2
figure 2

Example of a transect layout and localization with respect to a hypothetical wheel rut, but could be applied generally; b transect over one of the Pine studies sites

The residue volume estimation (m3 ha−1) was performed for all categories of residues: for 1h, 10h, 100h and 1000h classes, the Brown’s formula was used (Brown 1974; Woodall and Monleon 2008), described in Eq. (1). Whereas for bigger elements, the ones falling in the 1000 h+ class, the estimator was computed using the Woodall formula (Woodall and Monleon 2008), expressed by Eq. 3.

$${\widehat{Y}}_{ABCD}= \left(\frac{1.234 \cdot n\cdot {\overline{d} }^{2}\cdot c\cdot a}{\sum L}\right)\cdot {k}_{decay}\cdot 10.000$$
(1)
$$c= \sqrt{{1+\left(\frac{{Slope}_{\%}}{100}\right)}^{2}}$$
(2)
$$ \hat{Y}_{E} = \frac{\pi }{2L} \cdot \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} /l_{i} } \right) \cdot k_{decay} \cdot 10.000 $$
(3)

where: 1.234 is a conversion constant derived empirically; n is the number of elements for each class; d is the average squared diameter for the class; c is the corrected slope (Eq. 2); a is the correction coefficient for the position of the elements, equal to 1.13 for FWD and equal to 1 for CWD; L is the length of the sampling line(s); kdecay is the decay coefficient as described by Woodall and Monleon (2008); 10,000 are the square meters in 1 ha; yi is the volume for the single CWD piece and li is the length of the piece.

UAV data acquisition and pre-processing

The whole methodology hereafter described is summarized in Fig. 3.

Fig. 3
figure 3

Methodology workflow for this study, adapted from Udali et al. (2023)

UAV based images were collected using a DJI Mavic Air 2S with a single flight at 65 m altitude with 80% forward and 70% lateral overlap to cover the entire study site. The flights were performed in sky-clear, or under scattered cloud cover conditions. Ground Control Points (GCP) were positioned in the sites and the position of the points was recorded as well.

The images were pre-processed with a Structure-from-Motion (SfM) technique using Agisoft Metashape®: the sparse point cloud obtained from the images was georeferenced using the GCPs, optimized to reduce the reprojection errors and used to generate the dense cloud. From this last element, both the orthophoto and the Digital Elevation Model (DEM) were derived.

To build up the dataset for the classification model, similarly to Windrim et al. (2019), data related to logging residues was manually interpreted over the orthophoto in QGIS environment by placing a 5 × 5 m squared grid over the study area, and randomly selecting 10% of the grid cells to provide data for the model (Table 2). In the selected grid cells, objects were manually photo-interpreted, segmented and divided into 4 classes: Coarse Woody Debris (CWD), Fine Woody Debris (FWD), ground, and stumps. CWD was defined in this study as larger pieces of debris, clearly distinguishable from the background, as opposed to the smaller branches, foliage or bark (FWD). In this phase, all objects, when possible and clearly distinguishable, have been digitized in classes for the selected grid cells (Fig. 4). Moreover, merchantable timber left in the area in stacks has been filtered out and cropped from the images to avoid influencing either the classification or the volume estimation.

Table 2 Number of sample grid areas laid out to visually interpret the objects from the orthomosaic for the classification algorithm
Fig. 4
figure 4

Example of classes manual supervised segmentation over a 5 × 5 m grid cell

Data elaboration

The following steps in the data elaboration were performed using R (R Core Team 2023) and RStudio environment (R version 4.2.3; RStudio 2023.06.0 + 421 “Mountain Hydrangea”). Prior to every elaboration, both the DEM and orthomosaic were resampled to a 3 cm resolution to match the minimum accountable dimension of residue of medium size (100 h), and to compare the volume estimations based on the same pixel area (Table 3).

Table 3 Initial resolution (m) of orthomosaic and DEM obtained from Metashape for each study area

In RStudio environment the DEM was used to calculate the roughness and the Terrain Ruggedness Index (TRI), which represent the largest inter-cell difference of a central pixel and its surrounding cell, and the mean difference between a central pixel and its surrounding cells, respectively (Wilson et al. 2007). Moreover, curvature variables were also considered to integrate the topographic information of the sites and to fully exploit the microtopography derived from the UAV-based DEM. In particular, the newly computed terrain variables consisted of slope, tangential curvature, cross section curvature, minimum curvature, and profile curvature.

From the orthomosaic, the RGB values were extracted and used to compute textural variables for each one of the bands using the grey level co-occurrence matrix (glcm) function with window of size 3, producing 5 new accessory variables: mean, variance, homogeneity, contrast, and dissimilarity. Moreover, to account for the variability between the RGB singular layers, three vegetation indices were included: the normalized Green–Red difference Index (NGRDI), the Green–Blue Vegetation Index (GBVI), and the RGB Vegetation Index (RGBVI).

$$NGRDI= \frac{Green-Red}{Green+Red}$$
(4)
$$GBVI= \frac{Green-Blue}{Green+Blue}$$
(5)
$$RGBVI= \frac{2\cdot Green-Red-Blue}{2\cdot Green+Red+Blue}$$
(6)

Moreover, the DEM was also used to compute neighborhood variables to account the variability around the single pixels using the focal function from the R package terra (Hijmans 2023) and a moving window of size 5. This resulted in three new variables that accounted for the variability in the surface from the Digital Surface Model (DSM) considering the mean variation (mean_dsm), the variance (var_dsm) and standard deviation (std_dsm) for each singular pixel.

The textural variables, the vegetation indices, the neighborhood variables, and the terrain variables were all stacked together. The shapefile containing the interpreted residues was used to sample the variable values from the stack and were collected into a database.

Classification and validation

The database containing the variables was split following a 70/30 ratio for training and testing, respectively. First the training set was used to feed a Random Forest model to predict the class attribute, using the R package randomForest (Liaw and Wiener 2002). The prediction model used a total of 30 variables, using 5 of them (mtry) at each split and maintaining the default number of trees (ntree = 500).

The model obtained was then applied on the testing set and the results were expressed as a confusion matrix with accuracy indicators, as also suggested by Olofsson et al. (2014). In this case, the balanced accuracy for each class, the overall accuracy for the model and Kappa index (McHugh 2012) were derived directly from the model summary. Balanced accuracy refers to the average between the sensitivity (i.e. accuracy measured on the true-positive) and the specificity (i.e. accuracy measured on the true-negative) for each class. Model accuracy measures the number of correct predictions divided by the total number of predictions made. The Kappa index, which considers both the inter-rater and intra-rater reliability agreement among observations and observer, was used, in this case, to evaluate the effectiveness of the classification by comparing the agreement between the classification results and the expected values. The index was interpreted with the following guidelines (Landis and Koch 1977): value < 0 indicates no agreement, 0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1 as almost perfect agreement.

The model was then used to classify the orthomosaic raster to visualize the residue distribution over the sites.

Volume calculation

To perform the volume computation, the CWD and FWD polygons were extracted from the classified raster, filtered by selecting the ones with at least 120 pixels connected for both FWD and CWD, and used to calculate the volume for each polygon following a “local difference-of-height approach”.

The volume was then calculated using a Relative Elevation Attribute (REA) as described by Carturan et al. (2009): in this case the Digital Terrain Model (DTM) was interpolated using a low-pass filter to build a proxy for the terrain (\({\overline{E} }_{int}\)) over the DSM (Straffelini et al. 2021); considering the resolution of the DEMs (0.03 cm) and due to the ruggedness of the terrain the filter was built with a kernel of 101 × 101 cells. In this case, compared to Straffellini et al. (2021) that were looking at differences in height due to local depressions in the terrain, the aim is to observe differences in height due to the presence of woody debris. Therefore, for each cell, the REA—in this case representing the height information of each cell—was obtained by subtracting the elevation value of the interpolated terrain from the DSM. This information then was extracted for each residue polygon and used to compute the volume (Eq. 5).

The drone-based volume was then compared to the field-based volume by producing a volume ratio.

$$REA= {E}_{DSM}-{\overline{E} }_{int}$$
(7)
$$Volume= {A}_{polygon}\cdot {\overline{H} }_{REA}$$
(8)

Results

Classification and validation

The balanced class and model overall accuracy (OA) over the study sites are presented in Table 4. Overall, the class presenting the higher accuracy is the one representing the Ground, identified with an average accuracy of 92% over the five sites. The second highest is the FWD class with 89%, followed by CWD and Stumps with 84% and 66% respectively. Considering the overall accuracy, the classification model achieved positive results for all the sites with values greater than 85% to a maximum of 92% for E1. In Fig. 5 the sites classified according to the models are presented, showing residue distribution over the harvesting areas.

Table 4 Balanced class accuracy and model overall accuracy of the random forest models over the study sites
Fig. 5
figure 5

Examples of classified sites output from the Random Forest models. The blank areas within the Eucalypt areas had merchantable timber not yet extracted and therefore masked out from the classification and volume estimation. Also, a close detail of the classification is depicted

Volume estimation and comparison

The estimated volume per hectare for each study area and divided by residue class is reported in Table 5. In general, Pine sites show a higher volume value for CWD for both classes compared to Eucalypt ones, especially for the 1000 h class. On the other hand, Eucalypt sites show higher volume values when it comes to the lowest diameter material (1 h class) to reduce the values for the following FWD classes. On average, on Pine sites a higher volume of material was left for almost all the residues classes considered, with the exception of the finer material, where more was present in Eucalypt sites, and the coarser material, where no large difference was present.

Table 5 Average volume (m3 ha−1) for residues (FWD—fine woody debris, CWD—coarse woody debris) in the study sites divided for each class

The volume obtained from drone-based approach is presented in Table 6, together with the site average volumes for CWD and FWD, and relative comparison. The comparison was computed by comparing the drone volume with the field volume. On average, more volume was detected and retrieved for FWD than CWD, with higher volume for Eucalypt sites for both residues categories.

Table 6 Volume (m3 ha−1) comparison between drone-based estimations and field-based estimators

Discussion

Classification and validation

The evaluation of accuracies showed that, overall, the proposed method achieved good results in classifying harvesting residues in the study sites. The highest OA was achieved for E1, but it was also the site that was oversampled, in terms of number of plots per hectare (4 plot ha−1), compared to the others (~ 1.5 plot ha−1). If considering a division by tree species, the Pine sites showed, on average, better accuracy results when considering OA performances together with class accuracy FWD and Ground. This may be due to a more pronounced contrast in terms of light and color, used then from the classification model, between the different classes to distinguish them. Compared to other studies where the classification of harvesting residue was the main goal or just secondary aim, the presented model overall achieved good results (Table 7). Compared to previous research, the proposed approach produced a robust result in classifying CWD material with one of the highest accuracy values, whereas the performance in detecting stumps was more moderate, falling just outside of the range of values described by Puliti et al. (2018), maybe due to the difference in tree species as object of investigation. Compared to only the studies reporting the use of the same Random Forest model to classify CWD (Queiroz et al. 2019; Shokirov et al. 2021), the accuracy values presented in this study show higher results in both overall accuracy and class accuracy.

Table 7 A comparison of results between the present study and previous studies available in literature

The results achieved in this study are in line and correspond with previous studies using UAV-based information combined with ML techniques, although with a very limited number of studies available. While using a more advanced and complex method compared to ML algorithms (Ongsulee 2018), Windrim et al. (2019) presented a Faster-RCNN method relying only on RGB-based variables, without accounting for terrain-based information. Also, Queiroz et al. (2019) have proposed a similar approach combining spectral and LiDAR information to feed a Random Forest model. A more comprehensive approach was presented by Puliti et al. (2018) in which their Random Forest model used a total of 31 variables including spectral (RGB and RGB-derived ones), dimensional and geometrical variables. On the other hand, Shokirov et al. (2021) have used only terrain related variables (e.g., slope, roughness, aspect, DSM, TPI and TRI) from different laser-based sources to feed their classification model. For this study, the classification model saw a combined use of spectral and terrain variables achieving the highest result in accuracy.

The model related Cohen’s Kappa coefficient, overall ranging 0.73–0.90, interpreted as substantial to perfect agreement between the observed and classified values, provides a solid indication to the robustness of the designed classification model.

Considering the accuracy results, and the possible influence of the different variables, Table 9 reports the importance score for each variable related to each model application, therefore to each site; in particular, the score was computed for the 20 most used variables. Among the most used and with the highest score there are variables mainly related to terrain parameters (mean_dsm, min_curvature, and profile_curvature), followed by spectral variables such as NGRDI and GBVI. In this sense, it is possible to appreciate how the classifier preference sets on using terrain parameters to distinguish the elements in the area of interest before relying on spectral variables and their derivatives.

Volume estimation and comparison

The volume estimations obtained through the field survey are highly comparable with what is already presented in literature, for both tree genera considered. Regarding the Eucalypt sites, the volume estimations were not affected by the harvesting system. In their study, du Toit et al. (2000) estimated the potential residues amount of 40 dry t ha−1 over their study areas, which results in a volume of 78 m3 ha−1 (if considering an average specific gravity for Eucalypt of 0.511 g cm−3). These values are comparable with the values obtained from the study areas presented in this study. In another study, du Toit (2008) reported information about sites with different residues treatments from his own research and previous work (Bradstock 1981; Tandon et al. 1988; Hunter 2001; Campion et al. 2006). In his study he reported and compared, related to his specific aim, a range of residues (in this case branches and bark) values between 12.51 and 22.98 t ha−1, in which the average of FWD from E1, E2 and E3 fit (17.69 t ha−1, 20.14 t ha−1 and 13.07 t ha−1, respectively).

On the other hand, the available information on Pine sites in the literature does not provide a clear comparison compared to Eucalypt. First of all because of the destination of the timber material: in this case, the pines were grown on a sawtimber regime, resulting in more heavy and bigger branches compared to the ones cultivated for pulp purposes. With regards to our data, they fit in between these two categories, as reported and defined by Ross and du Toit (2004), also considering the quantity of residues.

When comparing the drone-based estimations and the field-based estimations, the results show a clear division between the estimates obtained at species level. In general, FWD volume tends to be more overestimated (2.33 for Eucalypt and 1.46 for Pine) compared to CWD volume (1.01 for Eucalypt and 0.28 for Pine). In particular, on Eucalypt sites there is a higher degree of variation in volume estimation for both FWD and CWD between sites. For example, the drone-based estimation for E3 site produced an overestimation of 4 times compared to field volume, whereas CWD for E2 and E3 gets slightly underestimated (0.80 and 0.77, respectively). These uncertainties may be attributed to two main reasons: (i) the choice of the specific pixel size as unit of investigation and (ii) the packing of residues. In the first case, the chosen resolution neglects the finer residues classes (1h and 10h), making it hard to assess their contribution to the volume estimation. In the second place, the packing of residues on layers poses an issue at both field and processing level: residues piled on top of each other are masking out material of the other classes, making it hard to get a correct volume estimation at the very end. Compared to previous research, Windrim et al. (2019) obtained a more positive outcome when it comes to CWD (R2: 0.545–0.587), probably due to the different segmentation and elaboration methodology, more suitable to perform object detection and segmentation. Shokirov et al. (2021) results are more comparable to the ones obtained (R2: 0.05–0.26) since they considered true positive-false positive together although they considered isolated CWD with only grass cover as background.

In general, there might be different reasons for the non-correspondence of the estimates.

  1. i.

    Classification errors. In particular for the woody debris classes, considering the models average, there is between 8 and 20% of misclassification error present. This, as the first step of the presented methodology, could have dragged the uncertainties further down.

  2. ii.

    Resolution. Compared to the drone-based estimates, the ones performed in the field were also designed to count the very fine material (Table A 1), which was not possible to perform with the remote sensed information, even after the resampling.

  3. iii.

    Terrain interpolation. The proposed methodology from Straffellini et al. (2021) was developed and applied to areas with different land-use destinations, and characteristics. Therefore, in adapting the solution proposed by the authors the choice of the kernel size was dimensioned on the CWD size rather than a FWD pile, which would have resulted in a larger moving window, also increasing the computational time.

  4. iv.

    Packing of residues. To adjust the differences between the two estimation techniques a residue packing ratio could have been applied to the measured volumes. However, to the extent of our knowledge, there are no specific packing ratio available in the literature for Eucalypt and Pine, yet. A valid example is provided by Hardy (1996) were the ratio were calibrated on different species but with similar characteristics to the ones presented in this study in terms of residues dimensions. However, applying the ratio to the data before mentioned would generate more errors since residues piles were not considered as a single unit in the field sampling. Still, the lack of a species specific packing ratio is an issue to be considered.

Conclusion

The main aim of this study was developed in two steps: (i) to assess the distribution of logging residues using a ML classification model fed with drone-based images, and (ii) to compare the volume estimated from the classified images with the volume obtained from in-field observation. The results indicate that the proposed methodology is suitable to evaluate the distribution of residues in the post-logging scenario with high overall accuracy (0.89), also when considering two different species with different characteristics. When it comes to volume estimation, this study proposed a novel approach relying entirely on UAV-borne images but also considering terrain-related variables. Overall it produced encouraging results with positive correlation, but could be improved by addressing the points mentioned beforehand (classification errors, resolution, terrain interpolation and packing of residues), especially the terrain interpolation from drone imagery and the development of a local packing ratio to correct the estimations. Moreover, results like this could help in making a better assessment of carbon stocks in harvesting sites implementing a carbon budget also considering the residues elements as temporary carbon reservoir. In the future, high resolution lidar DTMs will be more easily accessible and provide huge help developing the baseline for volume estimations.