Introduction

Miombo woodlands, the dry tropical forests spanning large areas of southern Africa, directly support many millions of livelihoods in various ways including supply of plant-based materials, fertile soils for agriculture, and grazing lands1. These ecosystems also hold cultural and spiritual significance, provide habitat for substantial plant and animal biodiversity, and regulate both the climate and water resources2. These landscapes, however, are changing because of human activities, with cover reducing from approximately 2.7 to 1.9 million km2 between 1980 and 20203. Owing to both their importance and dynamic nature, it is therefore crucial to monitor how the world’s miombo woodlands are changing.

One essential climate variable that requires accurate and precise monitoring is the aboveground biomass (AGB) and carbon stored in these woodlands4. Any uncertainty that exists in the quantification of these stocks has consequences, particularly regarding misinformed policy and decision making towards them, as well as the misallocation of funding and resources5,6. Carbon markets for example, through programmes such as Reducing Emissions from Deforestation and Degradation (REDD+)7, require low uncertainty in estimates of carbon stocks if they are to properly incentivise direct climate benefits, and co-benefits including biodiversity and ecosystem services, by safeguarding these woodlands. Further, intended outcomes from international climate agreements towards greenhouse gas emissions reductions, such as the Paris Agreement including individual countries’ Nationally Determined Contributions, are premised on forest carbon accounting with low uncertainty8. That is, both high accuracy and precision, quantitatively expressed as a bias and variance, respectively, are usually important for any estimate of forest AGB stocks in these contexts. Whilst accuracy is the principal concern in accounting (systematic over- or under-estimation commensurately misleads understanding of forest carbon sequestration and emissions potential)9, precise estimates are also important, including from the requirement to detect change over time (it can be problematic to interpret differences between observations with low precision)10. This is particularly the case for miombo woodlands given the aforementioned pace of their anthropogenic change.

The conventional approach to quantifying region-scale forest AGB stocks across miombo woodlands, and forests generally, within the context of UNFCCC- and IPCC-compliant greenhouse gas inventories, sees the combination of activity data and emissions factors (EF): remotely sensed estimates of forest area are multiplied by values of expected AGB per unit area of forest11. These expected values, based on in-situ measurements, might be generated from National Forest Inventories (NFI), or alternatively, where such data are unavailable, taken from the literature, such as IPCC defaults12. While this overall approach can be readily implemented it does have limitations, including: (i) restricted ability to describe AGB variations within forest types; (ii) EFs not being representative of the forest in question; and (iii) failing to detect change beyond binary transition between forest and non-forest (e.g. degradation).

For example, when focusing solely on the EF, and ignoring immediate questions surrounding the representativeness of applying a single value to any particular region of miombo woodland, uncertainties arise from the methods used to gather the in-situ data from the forest plots underlying the EFs themselves13. A ubiquitous feature of such measurements is the application of allometric models to estimate individual tree AGB. These models characterise the correlations that exist between tree shape and mass, enabling AGB estimation from more readily-measurable predictor variables such as stem diameter and tree height14. Such allometrics are themselves calibrated using hard-won destructive weighing measurements collected from a limited number of harvested trees that then must represent the entire variability of the specific taxa or region where that model is subsequently applied.

Uncertainties in allometric-derived AGB predictions therefore arise from the selection, measurement and modelling of these calibration trees, and the measurement of the predictor variables of any out-of-sample tree15. Several studies have explored the precision of allometric predictions of tropical and subtropical forests, where the expectation is that uncertainties range from 10 to 40% of the estimate itself at the hectare-scale16,17. Research has also explored their accuracy, with a particular focus on the selection and modelling of allometric calibration data18, which are routinely heavily skewed towards small trees owing to their relative ease of harvesting. It has been hypothesised that this, combined with inadequate statistical methods, might cause biased AGB predictions for underrepresented larger trees19. Concurrently, independent lidar-based methods for AGB estimation have shown large differences versus allometry, estimating up to 1.77 times greater stocks at the plot-scale20. These potential uncertainties in allometric predictions are problematic as they would propagate directly into derived EFs and their aforementioned applications.

Here, we present the first (to our knowledge) mapping of region-scale AGB stocks generated entirely independent of the above conventional methods, including activity data, EFs and allometrics, using 3D multi-scale lidar (MSL) data acquired across 50 kha of forests in and around Gilé National Park, Mozambique (Fig. 1). The continuous region of interest (ROI) where these data were collected was selected such that it ranged from intact forests, secondary forests in various states of degradation, through to clearland, resulting in data that could reasonably be considered representative of miombo woodland landscapes more widely. Across the ROI, the MSL dataset (approximately 450 billion measurements) comprised helicopter-based airborne laser scanning (ALS) across its entirety, unoccupied aerial vehicle laser scanning (UAV-LS) from six 300 ha sections, and terrestrial laser scanning (TLS) and conventional forest inventory measurements from six coincident 1 ha plots.

Fig. 1: Multi-scale lidar across Gilé National Park, Mozambique.
figure 1

a Approximately 450 billion laser scanning measurements were acquired in a 50 kha region of interest (ROI) located across the southeast corner of the park, capturing core area, buffer zone and beyond, such that the ROI encompassed intact miombo woodlands through to clearland (CRS is EPSG:32737). Helicopter-based airborne laser scanning (ALS) data were collected across its entirety, whilst slow-flying unoccupied aerial vehicle laser scanning (UAV-LS) data were acquired across six 300 ha sections. Terrestrial laser scanning (TLS) and conventional inventory data were collected in six 1 ha plots coinciding with these sections. b Location of Gilé National Park in wider Mozambique. c Example of coincident TLS, UAV-LS and ALS point clouds from a 10 m2 section of forest (coloured by reflectance). Map data © 2024 Microsoft.

An inverted pyramid approach was used to estimate AGB stocks and uncertainties across the ROI from these MSL data (Fig. 2), whereby each layer calibrated the next, commencing with the TLS point clouds that estimated individual tree AGB via both quantitative structural models (QSMs) explicitly reconstructing whole-tree woody architecture and volume21 (Fig. 3), and estimates of basic woody tissue density22. These estimates were themselves calibrated and validated using a literature sample of destructive measurements23. TLS-derived AGB was gridded and used to train extreme gradient boosting machine learning models24 using predictor variables retrieved from the UAV-LS point clouds that describe forest structure25,26 (e.g., canopy height, tree fractional cover, and voxel occupancy rates describing the 3D distribution of woody plant material), with the step repeated to upscale to the ALS. The optimisation and performance of these models was evaluated using spatial cross-validation methods27, with meaningful confidence intervals capturing uncertainties arising from both the QSM-derived training data and the upscaling itself, providing a robust understanding of the uncertainty in AGB predictions.

Fig. 2: Estimating forest aboveground biomass from multi-scale lidar.
figure 2

a TLS-derived estimates of gridded AGB (10 m resolution) were generated for the six 1 ha plots (Fig. 1a) via quantitative structural models describing the woody architecture and volume of individual trees (Fig. 3a). b Upscaled AGB across the six 300 ha sections was estimated through gradient boosting machine learning, using TLS estimates as training data, and metrics of forest structure retrieved from the UAV-LS data as predictor variables. c This upscaling step was repeated to produce AGB estimates across the ROI, and also shown here are the uncertainties associated with each pixel prediction. Models were evaluated using spatial cross-validation methods, and uncertainty quantification captured components arising from both the upscaling and the underlying TLS training data. d, e Examples of the predictor variables generated from the UAV-LS and ALS point clouds including but not limited to canopy height, tree fractional cover and voxel occupancy rates (a proxy for the 3D distribution of woody volume).

Fig. 3: Deviations between large tree aboveground biomass estimates from terrestrial laser scanning and allometry.
figure 3

a Illustration of one quantitative structural model derived from the TLS point clouds (here of a Pterocarpus angolensis) that, coupled with species-specific basic woody tissue density, enables estimation of tree-scale AGB. b The cumulative distribution of TLS- and allometric-derived AGB across the 1,071 trees matched in both the TLS and inventory data across the six 1 ha plots (Fig. 1), ordered by decreasing stem diameter. Allometric estimates were generated from three appropriate models: two miombo woodland specific models29 (predictor variables: stem diameter only [red], and stem diameter and tree height [orange]) and a pan-tropical allometry30 (predictor variables: stem diameter, tree height and basic woody tissue density [purple]). c Summed AGB estimates across these trees, whereby the percentage decrease between TLS- and allometric estimates is shown. The dotted lines show the contribution to AGB from the 115 largest trees (by stem diameter), where it can be seen that: (i) these ~11% of trees contributed ~50% of summed AGB, and (ii) allometric estimates were systematically smaller than TLS counterparts.

These new MSL-derived AGB estimates provide insights into the accuracy and precision of current best practices, and show that between 51 and 118% more AGB is stored across these miombo woodlands than conventional methods suggest. This is first demonstrated at the tree-scale by directly comparing TLS and allometric estimates for 1000+ trees (Fig. 3). We then show how these tree-level discrepancies, in part, translate into large differences at the region-scale, by comparing MSL-derived AGB stocks across the 50 kha ROI with counterparts estimated from both activity data and EFs (Fig. 4), and more direct mapping methods, using AGB products from the NASA GEDI spaceborne lidar mission28 (Fig. 5). In the discussion, we explore the likely drivers of these differences, and examine these results in the context of miombo woodlands and global change more widely, particularly the consequences for their protection and restoration.

Fig. 4: Multi-scale lidar predicts consistently larger aboveground biomass stocks across these miombo woodlands than conventional methods using activity data and emissions factors.
figure 4

a MSL-derived forest/non-forest map across the 50 kha region of interest, overlaid on the boundaries of Gilé National Park, generated by thresholding tree fractional cover greater than or equal to 30% using a canopy threshold of 5 m at 10 m resolution. b Distribution and mean (green) of MSL-derived aboveground biomass density predictions (Fig. 2) for pixels considered forested, versus a representative selection of four EFs (red, orange, purple and blue) taken from IPCC defaults, Mozambique’s Forest Reference Emission Levels, and literature on miombo woodlands12,31,32,33. c Summed AGB stocks across the ROI (including uncertainty in the MSL-derived estimate), whereby these EFs were combined with the FNF map. It is observed that the MSL approach estimated AGB stocks between 51 and 118% larger than these conventional methods, with a mean increase of 74%. Map data © 2024 Microsoft.

Fig. 5: Comparison between aboveground biomass estimates from multi-scale lidar and the GEDI spaceborne lidar mission.
figure 5

a Illustration of the 18,611 GEDI footprints available across the ROI overlaid on the MSL-derived predictions (Fig. 2). b Comparison between AGB density estimated via MSL and GEDI methods for these footprints. GEDI estimates are derived from a model for deciduous broadleaved trees in Africa (DBT.Af) using aboveground relative heights as predictor variables34. It is observed that MSL predictions are generally larger for densities greater than 50 Mg/ha. c This difference was examined by simulating GEDI waveforms and retrieving GEDI-perceived relative height metrics from the ALS data35, that were subsequently used to generate estimates of AGB density from the DBT.Af model. This subplot compares these estimates with MSL-derived AGB, whereby they are typically larger for lower densities and smaller for higher densities. That is, the agreement between MSL- and GEDI-derived AGB is part owing to these differences offsetting one another. Both scatter plots comprise an identity line (green), linear regression with free intercept (black) and statistics including concordance correlation coefficient (CCC), and root mean square difference (RMSD).

Results

Divergence between small and large trees

A total of 1071 individual trees were explicitly matched in both the TLS and inventory data, with their TLS-derived AGB summing to 462.0 Mg, compared with 450.8, 421.9, and 414.0 Mg (2.5%, 9.5%, and 11.6% smaller, respectively) predicted from two miombo woodland-specific and one widely-used pan-tropical allometric models, respectively29,30 (Fig. 3b, c). Approximately 50% of AGB was stored in the largest 115 trees by stem diameter (i.e., 11% of trees). Here, the differences in AGB predictions between both methods were more marked, summing to 232.0 Mg vs. 215.0, 198.6 Mg, and 197.8 Mg (7.9%, 16.9% and 17.3% smaller, respectively). That is, a systematic trend was observed whereby allometric predictions produced similar estimates for small trees, but smaller estimates for large trees.

Region-scale differences

MSL-derived AGB (Fig. 2c) across the ROI summed to 3.85 Tg ± 11.0% (uncertainty expressed as 90% confidence intervals), with uncertainty at individual pixel-level (10 m resolution) averaging 60.4%. This AGB estimate reduced to 3.65 Tg, with average AGB density of 98.4 Mg/ha when considering forested area only (37 kha, derived via a MSL-based forest/non-forest mask defined as tree fractional cover greater than or equal to 30% using a canopy threshold of 5 m at 10 m resolution31). Estimated AGB densities were 99.2, 100.3, and 86.5 Mg/ha for core park, buffer zone and beyond, respectively. Conventional AGB estimates via activity data and EFs ranged from 1.67 to 2.42 Tg (mean: 2.10 Tg) across the ROI (Fig. 4c), generated using the same mask, and four representative EF values of 65.2, 62.2, 45.1, and 54.0 Mg/ha (IPCC default for African subtropical dry forests, Mozambique’s Forest Reference Emission Levels, and literature on miombo woodlands, respectively12,31,32,33).

Comparison with GEDI

Overall, there was some agreement between MSL- and GEDI-derived AGB, with mean densities of 78.1 and 68.5 Mg/ha, respectively, for the 18,611 GEDI footprints available across the ROI, although it was observed that MSL estimates were generally larger for densities greater than 50 Mg/ha (Fig. 5b). These differences were explored by considering the African deciduous broadleaf forests model underlying GEDI-derived AGB (predictor variables: aboveground relative heights)34, and simulating GEDI-perceived waveforms and metrics from the ALS data35. This provided insight into the performance of this model, whose predictions were typically larger for lower densities ( < 50 Mg/ha) and vice versa for higher densities (Fig. 5c) compared to MSL counterparts. That is, the overall agreement was in part owing to differences at high and low densities offsetting one another.

Discussion

Here, we presented the first region-scale mapping of forest AGB stocks driven by direct 3D measurements of forest structure, independent of conventional methods, including allometrics. Importantly, these estimates have a credible estimate of uncertainty being 11.0% of the region-scale AGB estimate itself. We note that even with these first-of-their-kind MSL measurements capturing samples of the structure of each individual tree across the 50 kha region, pixel-level (10 m resolution) uncertainty frequently exceeded 60% (averaging out to approximately 36% and 27% when aggregating to 30 and 100 m resolution, respectively). That is, these miombo woodlands exhibit pronounced structural and woody tissue density variation, part of which remained uncaptured by either the MSL data themselves, or more likely, the developed processing methods. These maps then, are likely inappropriate for small-scale applications such as individual tree AGB estimation36, but suitable for enabling accurate local and regional carbon accounting through calibration and validation of Earth observation instrumentation such as GEDI, and the upcoming ESA BIOMASS mission37.

The principal insight from these MSL methods is that 51–118% more AGB is stored across the ROI than predicted by conventional methods. A key driver of these differences was observed at the tree-level, where TLS-derived AGB estimates were systematically greater than allometric counterparts for large trees. This was also reflected in the GEDI analysis, where the AGB model for African deciduous broadleaf forests34, itself underpinned by allometry, generally predicted lower values than MSL-derived methods for higher densities ( > 50 Mg/ha). The importance of this point is magnified when considering the disproportionate contribution of large trees to upscaled AGB, as observed here (i.e., 11% of trees contributed greater than 50% of AGB across the six 1 ha stands) and described in literature38. We cannot definitively state whether this is due to over- or under-estimation of either method, or some mixture thereof, as accompanying destructive harvest data were not acquired. However, our TLS-derived estimates were calibrated using a representative sample of destructive measurements from the literature23. The trend for allometric and TLS methods to produce biased and unbiased estimates for large trees, respectively, is consistent with studies where estimates from both methods were coincident with destructive measurements across various forests39,40.

The cause of this potential systematic underestimation remains an open question. One possibility however, is that all widely-used allometrics are modelled via log-transformed linear regression41, and then applied to trees of all sizes (oftentimes caveated that any given model should not be used to predict an out-of-sample tree if its size falls outside the range of the calibration data42). It is implicitly assumed then, that model parameters are as equally suitable for large trees, as they are for small trees, within the context of the accuracy of predictions. However, the underlying calibration data are, as a rule, skewed towards smaller trees, often necessarily because of the increasing difficulty of harvesting larger trees43. For example, for the miombo woodland specific and pan-tropical models considered here29,30, the median stem diameter of the 167 and 4004 trees comprising the calibration data was 30 and 15 cm (mean: 35 and 24 cm), respectively. This compares with a median and mean stem diameter of 48 and 50 cm for the largest 10% of trees across the six 1 ha stands. The nature of linear regression whereby each observation is usually assigned equal weight44, therefore suggests these aggregate models are unlikely to be representative of large trees, thus leading to biased predictions if this parameter invariance assumption is invalid19. Owing to the aforementioned context of large trees driving AGB distributions, and limited study on this subject15,18,20, an argument from parsimony would be that large tree allometric predictions are less certain than small tree estimates, unless proven otherwise.

The MSL methods used here provide capabilities to resolve this issue and further enhance conventional methods. Whilst the airborne components presented here are at a more experimental stage, TLS methods are closer to operational readiness and less cost prohibitive. These data can be collected from 1 ha plots, within days, using sampling protocols complying with adopted good practices45. Data processing methods including segmentation and structural modelling are complex, but substantial progress has been made in recent years, particularly on both automation46, and the avoidance of overestimating the volume of smaller trees and high-order branches47. TLS methods can be deployed in two ways: first, and most direct, estimating plot-scale AGB by summing contributions from individually modelled trees. This, applied across NFI networks for example, would enable updated EFs to be generated. Second, improving existing allometric models by augmenting their calibration datasets48. The key here would be generating uniform datasets (i.e., across tree size) through the addition of larger trees. This is appealing as it would both reduce the uncertainty in highly-practicable allometric methods, and leverage the value of historic datasets. However, appending calibration datasets with TLS-derived AGB observations is non-trivial. Assumptions in linear regression include that the mean of the distribution of error in the dependent variable (i.e., AGB) is zero, and ideally errors are not autocorrelated or heteroscedastic44. Therefore, such efforts require thoughtful undertaking.

Returning then to the region-scale predictions, while the divergence between TLS- and allometric-derived AGB explains part of the overall difference, they are also driven by the selected EFs. That is, these values are not only underpinned by allometrics, but also the sampling pattern of their underlying field plots, and how that differs from the composition of the ROI considered here. This was partially unpicked by stratifying the ROI into core park, buffer zone and beyond, where it was observed that AGB densities were still an increase of 75%, 77%, and 53% more than the mean of the selected EFs. An important contributor here is the long-tail of large MSL-derived AGB observations driven by aforementioned large trees (e.g., predictions greater than 150 Mg/ha contributed 36% to the total 3.65 Tg), and the statistical likelihood these would be undersampled by randomly distributed field plots.

The question remains then: what are the implications of these observed differences in AGB stocks for our understanding of miombo woodlands? That largely depends on the transferability of our results to the world’s 1.9 million km2 of these forests3. The ROI where data were collected was deliberately positioned to capture as much of the range of states and successions, and therefore variance in AGB, across the wider region as possible. This capture of structural and taxonomic variation is illustrated by the inventory measurements across the six 1 ha plots (Table S1), where stem count, stem diameter and basal area ranged from 56 to 349, 10 to 75 cm and 1.9 to 20.9 m2/ha, respectively, and that 81 of the estimated 334 unique species across miombo woodlands were observed, including from the dominant Brachystegia and Julbernardia genera1. Further, tree fractional cover across the 50 kha ranged from 0 to 1 with a mean of 0.59. These traits coincide with the ranges observed more widely across the continent49, so we therefore suggest it is not unreasonable to consider these sampled forests as being at least somewhat representative of miombo woodlands more broadly.

Speculatively then, if we were to extrapolate our results across the world’s miombo woodlands, they potentially store in the region of 3.7 PgC more carbon in their AGB than currently estimated, assuming the mean of the considered EFs (56.6 Mg/ha) is uplifted by 74% (assuming 47% carbon content). It is also noteworthy that MSL methods detected an additional 0.20 Tg AGB stored across the ROI in land classified as non-forest, potentially increasing this delta still further, and emphasising that fragments of miombo woodlands have the potential to store significant quantities of carbon50.

Whilst such extrapolation requires additional data for confirmation, the magnitude of this difference suggests our understanding of the role these forests play in global change requires a rethink, considering this overall increase is approaching the current annual global atmospheric increase (5.1 PgC/yr)51. That is, these forests could have a more potent ability to sequester carbon from afforestation and reforestation efforts, albeit equally the reverse, that their loss leads to increased emissions. Finally, an uplift in the carbon density of these forests per unit area could correspond to a proportional factor increase of 1.5 to 2.2 in their value on carbon markets, thus better incentivising their protection and restoration, and disincentivizing the value extracted from their deforestation52.

Materials and methods

Site description

The 50 kha region of interest (ROI) where data were collected (Fig. 1a) was located on the southeastern border of Gilé National Park, Zambezia Province, Mozambique. The forests here feature woodlands, riverine forests, and wooded savannas, dominated by trees from the Brachystegia and Julbernardia genera, characteristic of the more broad classification of miombo woodland1. Mean annual precipitation is between 800 and 1000 mm, with a dry season May–October53. Mean monthly temperature varies from low teens to high thirties, the terrain is largely flat, and soils comprise sandy loam and sandy clay54.

Study design

The ROI was positioned such that it covered core park, buffer zone and beyond (Fig. 4a). The dataset (Fig. 1a) comprised airborne laser scanning (ALS) data across the entirety of the ROI (designation: GIL), unoccupied aerial vehicle laser scanning (UAV-LS) data from six 300 ha sections (designation: GIL01 to GIL06), and terrestrial laser scanning (TLS) measurements and inventory data from six 1 ha plots coincident with these sections (designation: GIL01-01 to GIL06-01). These sections and plots were strategically located to capture variations in forest state, succession, structure, and taxonomy across the ROI.

Data collection

Data were acquired between June-November 2022. The six 100 × 100 m planimetric plots were established and inventoried using RAINFOR protocol55. Measurements of each tree inside these plots with stem diameter ≥ 10 cm included: (i) stem diameter via a circumference/diameter tape, (ii) point of measurement of the stem diameter (either 1.3 m above ground or 0.5 m above-buttress), (iii) taxonomic identity determined by a single trained botanist, and (iv) x-y coordinates using eye-estimation.

TLS data were collected in GIL01-01 through GIL06-01 using a RIEGL VZ-400i laser scanner. Sampling followed established protocol56. That is, in accordance with the CEOS Aboveground Woody Biomass Product Validation Good Practices Protocol45. In particular, scans were acquired from 121 locations at 10 m intervals across each plot, with upright and tilt scans acquired at each location to capture a complete sample of the scene. The instrument pulse repetition rate was 300 kHz and the angular step between sequentially fired pulses was 0.04 degrees. The laser pulse had a wavelength, pulse width, beam divergence and exit footprint diameter of 1550 nm, 3.0 ns, 0.30 mrad, and 7.0 mm, respectively. Coarse georeferencing of scans was generated from an onboard GNSS receiver obtaining real-time differential corrections from a nearby static Emlid Reach RS2 GNSS receiver.

UAV-LS data were acquired across GIL01 through GIL06 using a RIEGL VUX-120 and Trimble Applanix APX-20 GNSS/INS kinematic laser scanning system mounted on a hybrid-electric drone, in a 50 m double-gridded configuration at 5.0 m/s velocity with an aboveground level of 100 m. The instrument pulse repetition rate, field of view and scan rate were 1800 kHz, 90 degrees and 315 lines per second, respectively. The laser pulse had a wavelength, pulse width, beam divergence and exit footprint diameter of 1550 nm, 3.0 ns, 0.38 mrad and 5.7 mm, respectively.

ALS data were acquired across GIL using the same kinematic laser scanning system mounted on a helicopter, in a 127 m spaced parallel line configuration at 41.2 m/s velocity with an aboveground level of 160 m. In this configuration, the pulse repetition rate, field of view and scan rate were 1200 kHz, 100 degrees and 396 lines per second, respectively. A nearby static Stonex S900A GNSS receiver collected observables throughout UAV-LS and ALS data acquisition for georeferencing purposes.

Data preprocessing

Inventory data (1406 trees) were manually digitised from field sheets, with accuracy assessed via a second operator digitising a randomly selected 5% subset. Errors in taxonomic identity were resolved using the Taxonomic Name Resolution Service57. Estimates of basic woody tissue density were derived from the mean of entries available in the Global Wood Density Database58, whereby attribution was made if possible at the species-, or else genus-level (84.4 and 14.2% of the trees, respectively). If no taxonomic attribution could be made, basal area-weighted plot average wood densities were used (1.4%).

TLS data were co-registered into georeferenced (EPSG: 32737) tiled point clouds (Fig. 1c) using RIEGL RiSCAN Pro (v2.15) via its Automatic Registration 2 and Multi Station Adjustment 2 modules. Airborne mission trajectories were refined to survey-grade accuracy and precision via Applanix POSPac UAV (v8.8) using GNSS observables from the base station, whose absolute positioning was refined using the AUSPOS service59. Lidar data were united with these trajectories and merged into georeferenced tiled point clouds (Fig. 1c) using RIEGL RiPROCESS (v1.9.2.2) including its RiUNITE (v1.0.3.3) and RiPRECISION (v1.4.2) modules. TLS point cloud georeferencing was refined by aligning with the UAV-LS data using an iterative closest point algorithm implemented in CloudCompare (v2.12.4). Noise in TLS, UAV-LS and ALS point clouds was labelled using reflectance and deviation thresholding60, and statistical outlier filtering. Incompletely sampled tiles were discarded based on point density and morphological erosion.

Data processing

TLS-derived tree-scale aboveground biomass (AGB) estimates were generated in six steps. First, point clouds representing individual trees either inside, or part of whose AGB fell inside the plot, were segmented from the tiled point clouds (1,339 trees). This was undertaken both manually in CloudCompare (v2.12.4) and using the Forest Structural Complexity Tool46. Second, point clouds were manually linked with inventory data via stem maps, whereby due to edge effects around the plot (i.e. trees with stem base inside but crown partly growing outside the plot, and vice versa) and multi-stemmed trees, there were somewhat fewer point clouds than census entries (1339 vs 1406 respectively). Third, leafy material, which had distinctively lower apparent reflectance than the woody material, was segmented from point clouds via thresholding based on evaluating single-tree reflectance histograms. Fourth, quantitative structural models (QSMs) (Fig. 3a) were constructed for woody point clouds using TreeQSM21 (v2.4.1). QSMs were inspected by eye, and validated via comparison to the input point clouds (i.e., point-to-cylinder distances). Fifth, potential overestimations of small branch volume arising from wind, co-registration error, properties of the laser pulse itself, or some mixture thereof, were negated using a post-processing step based on metabolic scaling theory61. This limited the maximum diameter of third order and above branches to half the diameter of their parent, and the diameter of cylinders intra-branch to no more than their parent. This was validated by comparing open-access data from previous studies where destructive data were available23. Sixth, AGB was estimated from QSM-derived volume and basic woody tissue density, the latter obtained from the established link between point clouds and inventory data.

Metrics describing forest structure (Fig. 2d, e) were generated from the UAV-LS and ALS point clouds at 10 m resolution unless stated otherwise. The metrics, described in25, and retrieved using methods similar to those implemented in lidR62, comprised: digital terrain and canopy height models (1 m), relative height, tree fractional cover, canopy height rugosity, fixed and variable gap fraction, canopy closure, canopy ratio, z-entropy, skewness and kurtosis. Additionally, voxel-based metrics describing the 3D distribution of woody plant material were also retrieved26. This was undertaken by segmenting leafy material via reflectance thresholding, and partitioning the UAV-LS and ALS point clouds into voxels with 0.1 and 0.5 m edge length, respectively. Volumes of each voxel comprising at least one point were then aggregated in 3D grids of 1 m and 5 m resolution, respectively.

Aboveground biomass modelling

TLS-derived gridded estimates of AGB across GIL01-01 through GIL06-01 (Fig. 2a) were estimated by constructing a georeferenced 10mgrid across each plot, decomposing each QSM into its constituent cylinders, and allocating volume to respective cells. Numerical approximation estimated the quantity of intra-cylinder volume assigned to multiple cells. Cells were considered valid only when all AGB from trees with stem diameter ≥10 cm was captured. This included contributions from trees outside the plot whose stem or crown only partially fell inside the cell in question. For such trees, QSMs were produced with the encroaching cylinder volumes attributed to the relevant cell, assuming a wood density equal to the plot basal area-weighted wood density. In total, 568 cells, of which 473 were non-zero, were created with biomass contributions from 1,259 QSMs.

Gridded estimates of AGB (10 m resolution) across GIL01 through GIL06 (Fig. 2b) were retrieved using extreme gradient boosting machine learning via XGBoost63 (v1.6.2), which has previously been applied to AGB modelling24. The TLS-derived gridded estimates of AGB were used as training data, and the spatially coincident UAV-LS-derived forest structure metrics were used as predictor variables. The choice of metric and the produced biomass map resolution (10 m) was informed by balancing having plentiful training pixels (i.e. higher resolution) versus the information contained within each pixel to predict biomass, using a rule of thumb that there be at least ten times more training pixels than features. Optimised hyperparameters and feature selection were found by minimising the root mean square error of validation folds within a spatial cross-validation framework27 using GIL01-01 through GIL06-01 as separate folds. For this, a random grid search of the following six hyperparameters was undertaken: (i) learning rate (a step shrinkage weight; used to make the boosting process more conservative and reduce the risk of overfitting); (ii) minimum loss reduction (a decision parameter to make a partition on a leaf node of the tree); (iii) maximum depth of a tree (a proxy for the complexity, and hence overfitting risk, of the model); (iv) minimum child weight (minimum sum of the hessian required in each child, with higher values again resulting in more conservative models); (v) subsample of the training instances (fraction sampled from the training data prior to tree growing); and (vi) column sampling ratio (fraction of the features to be subsampled). Further performance metrics were also generated, including bias, and two based on the log of the accuracy ratio: (i) median symmetric accuracy, and (ii) symmetric signed percentage bias. These two metrics are well-suited to assessing predictions potentially spanning several orders of magnitude64.

Gridded estimates of AGB (10 m resolution) across GIL (Fig. 2c) were retrieved by repeating this step, where the UAV-LS-derived gridded estimates of AGB were the training data, and the spatially coincident ALS-derived forest structure metrics were the predictor variables. Fig. S1 and Fig. S2 present cross-validation statistics, the weight and gain assigned to each predictor variable, and scatter plots illustrating predicted versus reference AGB, for the UAV-LS and ALS models, respectively.

A direct TLS-to-ALS upscaling model was also tested (i.e., skipping the intermediate UAV-LS layer). The direct model was slightly more biased (cross-validation bias with TLS labels: −3.63%) and less accurate (cross-validation RMSE with TLS labels: 62.6 Mg/ha), thus not considered further in this study. Further methodological improvements to the MSL workflow could encompass more sophisticated lidar features, perhaps tailored to the extreme high point density of UAV-LS, that are likely to correlate more with AGB than the ones used in our study. Additionally, spatially explicit models, such as convolutional neural networks, can take into account spatial context rather than only the pixel values themselves. Here, spatial information is only used in the cross-validation of the XGBoost models.

Uncertainty quantification

Uncertainty in TLS-derived tree-scale AGB arises from the underlying point cloud itself, quantitative structural modelling and basic woody tissue density estimation43. Uncertainty from these sources was implicitly captured by modelling the expected distribution of error using existing data where TLS-derived AGB estimates were available alongside reference measurements derived from destructive harvesting and weighing (391 trees from 111 species)23. To increase the representativity of these harvested trees to the ROI, the dataset was subsetted to contain only trees with stem diameter <75 cm, scanned in leaf-on conditions, and to exclude trees from boreal and temperate regions (n = 174). The error distribution was modelled via the mean and variance of residuals, as a function of stem diameter, using linear and non-linear quantile regression, respectively. The appropriate mean residual was subtracted from the raw AGB estimate for each tree to remove bias known to arise in TLS-derived volume estimates, especially in smaller branches47, complementing the QSM postprocessing based on metabolic scaling. Uncertainty in TLS-derived gridded AGB was derived as a volume-weighted combination of tree-scale AGB uncertainty, by modelling the true AGB of each tree as following a Gaussian distribution dependent on its TLS-estimated AGB, and independent of all other trees. This assumption of independence refers only to effects causing a discrepancy between TLS-estimated AGB and true AGB (i.e., imperfections in lidar scanning, QSM reconstruction and wood density assignment) and does not neglect spatial correlation of true AGB (e.g., arising from the effect of trees on each other), which is inherited by TLS-estimated AGB, but remains an approximation since spatial correlation may exist in some residual effects such as wind noise. Consequently, the only modelled source of covariance is due to a given tree spanning more than one pixel. This enabled calculation of the full pixel covariance matrix, capturing both the marginal uncertainty in each pixel and the correlation between pixels.

Uncertainty in UAV-LS-derived gridded estimates of AGB was modelled as the sum of two independent components: measurement variance and model variance. Measurement variance was estimated using a Monte Carlo random sampling approach. 100 samples of each of the six TLS-derived gridded AGB estimates were generated from their underlying multivariate Gaussian distribution, described above. Each set of six gridded AGB estimates was used to train a separate XGBoost model, producing 100 gridded predictions of AGB for each UAV-LS section, the sample variance across which was taken as the measurement variance, per pixel. Model variance was estimated as a zero-intercept linear function of predicted pixel AGB, with slope calibrated from the cross-validation data described in the previous section. Uncertainty in ALS-derived gridded estimates of AGB (Fig. 2c) was estimated by repeating this process, using the 100 UAV-LS-derived gridded AGB estimates as training data to a further ensemble of 100 XGBoost models. Uncertainty is expressed as 90% confidence intervals throughout.

Conventional methods

Allometric-derived tree-scale AGB estimates were produced by three allometric models. These models are widely used for carbon stock estimation in the region. First, the pan-tropical allometry described in30 that considers the predictor variables stem diameter, tree height and basic woody tissue density, itself calibrated from the harvest of 4,004 trees across the tropics and subtropics. Second, two miombo woodland specific allometries described in29, themselves calibrated from the harvest of 167 trees in Tanzania, that consider: (i) stem diameter only, and (ii) stem diameter and tree height. Tree height was derived from TLS data. Individual trees with non-matching stem diameters between the inventory and TLS were excluded from the tree-level AGB comparison (i.e., stem diameters with >5 cm difference; n = 188) to ensure potential errors in linking both data sets were omitted.

A selection of four representative emission factors (EFs) were gathered to enable conventional region-scale estimation of AGB. We used values from: (i) the IPCC default for African subtropical dry forest described in12 (using a combination of L- and C-band radar); (ii) Mozambique’s Forest Reference Emission Levels31 for semi-deciduous forest including Miombo (based on the country’s National Forest Inventory and the allometrics described in29); (iii) specific to Zambezia province described in32 (obtained from L-band radar and a network of forest plots using the allometrics of30); and (iv) Mozambique-wide described in33 (destructive harvesting coupled to 27 ha forest inventory).

Comparison with GEDI

GEDI L2A (version 2) and L4A (version 2.1) products (i.e., relative height metrics and AGB density, respectively) were downloaded from NASA’s Earthdata65. These data were filtered to only include observations between day-of-year 275 and 365 (to match seasonality) from the years 2018–2022, with a sensitivity greater than 90%, spatially overlapping with the ALS data, and with 98% relative heights under 35 m. This resulted in 18,611 GEDI observations (Fig. 5a) from the available 72,101. Coincident MSL-derived AGBD was retrieved by simulating a circle at the GEDI footprint coordinates (12.5 m radius), and performing a weighted average extraction from the gridded 10 m resolution lidar-derived AGB predictions. No geospatial aligning of GEDI footprints with the ALS data was applied.

Additionally, we tested the influence of the AGB model underpinning GEDI’s L4 product in the region (i.e. the African dry broadleaf forest model (DBT.Af)34). Since relative heights retrieved from the ALS data (calculated from the height distributions within a point cloud) are not comparable to GEDI-derived relative height (derived from a waveform from a single laser pulse), we simulated the ALS data into GEDI-perceived waveforms for 10,000 randomly distributed 25 m diameter point cloud sections within the ROI using gediRat35. These resulting simulated waveforms were directly comparable with GEDI’s waveforms and thus were used as input to GEDI’s DBT.Af model, to predict AGBD.