Introduction

Forest ecosystems are often characterized in terms of structure, composition, and functions [1]. Light Detection and Ranging (LiDAR) remote sensing (RS) has substantially improved our understanding of forest structure around the world in recent decades [2,3,4,5]. LiDAR instruments provide explicit three-dimensional (3D) data that have enabled measurements of forest structure parameters such as canopy height, leaf area index, and diameter at breast height across different scales with unprecedented accuracy [6,7,8].

LiDAR data can be collected from a variety of sensors and platforms, resulting in a range of 3D data types (mostly point clouds), with different point densities, accuracies, and perspectives. Common LiDAR sensors can be mounted on different platforms including ground-based, both fixed and mobile [3, 9], airborne with unoccupied aerial vehicles (UAVs or drones), helicopters, and airplanes [10, 11], and space-based from satellites or the international space station [7, 12, 13]. The cross-scale LiDAR data collection has enabled many applications of tree and forest measurements, including forest inventories and biomass estimates [14, 15], species and habitat classification, biodiversity assessment [16, 17], forest fuel estimates [18] and detailed 3D reconstruction of trees [19, 20].

While LiDAR instruments have developed rapidly and extensively, the data continue to have limitations. For example, ground-based LiDAR data might not record all trees and tree tops due to occlusion [21]. Conversely, airborne and spaceborne LiDAR instruments can measure the top of the canopies and, in some cases, forest vertical structure, but rarely capture stems below canopies [22]. Moreover, LiDAR is specifically used to gather information on vegetation structure, but provides limited information on other important drivers of forest ecosystems, composition, and functioning. These limitations have resulted in a rapid increase in data fusion approaches, in which data from various instruments can be merged together (multi-sensor approach) to enhance the data and their application potential.

Various definitions of data fusion have been proposed [23, 24]. Here, we focus on multi-source or multi-sensor LiDAR data fusion, defined as “the merging of data or derived features from different sources (instruments/devices), of which at least one is LiDAR data, to improve the information content of the data sources and enable enhanced forest observations". Multi-sensor data fusion approaches have been deemed useful in overcoming measurement and sampling limitations from the original dataset to the final information extraction [25].

This review paper aims to summarize the current state-of-the-art LiDAR data fusion approaches for forest observations and identify main challenges that need to be addressed to move forward. We consider two levels of multi-sensor data fusion in this review: (1) data-level fusion, and (2) feature-level fusion. In data-level fusion, raw datasets from various sources are combined into one dataset or product (e.g. merging of two LiDAR point clouds, one collected with ground-based LiDAR and the other with unoccupied vehicle laser scanner (ULS)) [26]. In feature-level fusion, features extracted from various data sources individually are merged into new features or vectors (e.g. merging of structural parameters from LiDAR with coincident spectral parameters from hyperspectral (HS) data to derive a species classification) [27, 28].

This paper includes two major components. The first component provides a structured literature review on LiDAR data fusion addressing the following questions:

  • What are the trends in LiDAR data fusion in the last decade?

  • What are the main motivations and applications of LiDAR data fusion?

  • What are the main methods used to perform data fusion?

  • What are the main gains of LiDAR data fusion?

The literature review was then analyzed by a team of 11 international experts to address the following key questions:

  • What is ‘data fusion’ and how should this term be used in our community?

  • What are the most important lessons learned about data fusion in forest observations?

  • What are the main challenges in data fusion for operational applications?

  • What should the community focus on to move data fusion forward?

The experts in the team were assembled through the EU COST Action 3DForEcoTech; an EU initiative to bring together all experts on LiDAR data for forestry within the EU. An open call was held to solicit scientists interested in collaborating on this literature review. The final team was assembled to encompass all expertise required for addressing the key questions, including scientists with expertise on all types of LiDAR (mobile, terrestrial, airborne and spaceborne) and fusion with all common datasets assessed here (multispectral, hyperspectral, and radar).

Structured Literature Review Method

We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach [29, 30]. The following search terms were used in the Web of Sciences database: LiDAR AND fus* (Topic) and forest* OR tree OR canop* (Topic) and structure OR height OR inventory (Topic). We included literature from the last decade between January 2014 - May 2023, published in English language, and with a publication status of ‘article’ or ‘review article’. As defined in the introduction, we focused on multi-sensor data fusion. We did not consider studies that included a combination of two datasets from the same sensor collected at different times or at different locations. By limiting our search to only include the term ‘data fusion’ and no alternative search words, such as ‘data integration’ or ‘data combination’ (that may refer to the same process), we demonstrate how ‘data fusion’ is specifically used in the last decade. In the Discussion sub-section Data fusion, we further discuss the term ‘data fusion’ in relation to other terms with a potentially similar meaning in the LiDAR context.

Literature Search Results

The Web of Science query resulted in 664 papers (Fig. 1). Of these, 407 adhered to the eligibility criteria defined above (2014-2023, English, article or review). The abstracts of these 407 papers were screened by two independent reviewers, who decided whether to include or exclude a paper based on two criteria: (1) some aspect of trees/forest, relevant to forestry applications, was assessed, and all papers that solely studied crops, infrastructure or buildings were eliminated, and (2) the fusion must include LiDAR data.

Fig. 1
figure 1

Framework of structured literature review and coding scheme

Extracting Information from Literature

We developed a coding scheme to organize the information in the 151 papers in a comprehensive and understandable fashion that addressed the four main research questions. The coding scheme consisted of five main categories: general information, geographic location, survey area, data characteristics, and survey goals (Table 1). In the category ‘general information’, we included the most pertinent information, so the paper could be relocated for later analysis. In ‘geographic location’, we included information on the continent and country/countries of the study areas. Regarding ‘survey area’, we included survey scale (i.e. global or local) and forest stand (i.e. type of vegetation surveyed). In ‘data characteristics’, we included information on the LiDAR platform used, as well as the sensor's name and type. We also recorded the datasets that were fused with the LiDAR dataset. Within ‘survey goals’, we included information on the application for which the fusion was used, the motivation (aim) for the fusion (e.g. increasing spatial resolution of data product), the type of method used to fuse the datasets, and reported gain of the fusion process.

Table 1 Categories and subjects included in the coding scheme for the structured literature review of the 151 selected papers

Trends in Data Fusion Literature

The number of publications concerning LiDAR data fusion for forests demonstrates a slight general upward trend over the last 10 years, especially in 2022 (Fig. 2). LiDAR data from airborne platforms were most commonly used. These airborne platforms include both instruments mounted on UAVs and occupied aircrafts. Fusion with data from terrestrial platforms, including terrestrial laser scanners (TLSs) and mobile laser scanners (MLSs), seems to be emerging in recent years, starting in 2016. Generally, there has been a slightly increasing trend in the use of spaceborne LiDAR sensors, with satellite papers published in 2016 and 2017 employing data from ICESat/GLAS and the papers published after 2018 with data from ICESat-2 and GEDI.

Fig. 2
figure 2

Number of publications on LiDAR data fusion and general publication trend in LiDAR in forestry applications over the last decade. The shaded bars refer to the various LiDAR platforms. Multiple platforms indicates that LiDAR data from two (or more) different platforms was fused. Note 2023 only includes papers published until May

LiDAR data can be fused with data collected from a similar platform (e.g. airborne-airborne) or a different platform (e.g. airborne-spaceborne). Fusion of airborne LiDAR and other airborne data types was the most common type of fusion encountered (45.4%), followed by fusion of LiDAR data from airborne and spaceborne devices (29.8%). Spaceborne LiDAR fused with data collected by other spaceborne sensors and airborne-terrestrial fusion had the same amount of publications (11.3%), whereas fusion of terrestrial LiDAR with other data from terrestrial platforms was found to be the least common (2.1%) (Table 2).

Table 2 Number of publications by platform, where at least one of the sensors is LiDAR

In terms of geographical representation (Fig. 3), studies from North America (38%), Europe (31%) and Asia (21%) represent 90% of the publications. The remaining 5% study Australia, and another 5% focus on Africa and South America together. In particular, our literature review found very few LiDAR data fusion studies in the southern hemisphere. This pattern is consistent with a review of the geographic distribution of authorship in remote sensing publications [31], documenting that four specific countries, the USA, Italy, Germany, and China, are over-represented, with almost no contributions from South America and Africa. Our literature sample demonstrates that most of the fusion studies in Asia are taking place in China alone, while other countries such as Iran, India, and Malaysia are studied just one time each.

Fig. 3
figure 3

Geographic distribution of study locations in the 142 case studies included in our literature sample

Main Motivations and Applications of LiDAR Data Fusion

Motivations

Three main motivations for data fusion were found: (1) fusion of data across platforms can enhance spatial or temporal resolution of the data product. (2) two different LiDAR datasets can be fused to improve data density and/or overcome occlusion. For example, terrestrial and aerial point clouds are fused to better represent both the top and the bottom of the canopy, and to subsequently extract structural parameters more accurately [32, 33]. (3) fusion from the same platform primarily enriches the existing dataset with additional information, and these studies seek to add more information to the LiDAR dataset. For example, spectral data can be fused with LiDAR data to create a better estimate of above-ground biomass (AGB) or improve tree segmentation.

Applications

In the LiDAR data fusion literature, we find two main streams of applications, at the individual tree level (ITA - Individual Tree Approach) and at the area level (ABA - Area-Based Approach). Among all papers reviewed, 27% focus on ITA, 50% on ABA, 17% on both ITA and ABA, and 6% are review papers. The main applications of LiDAR data fusion at these two levels are divided into seven categories:

  1. 1)

    Classification (tree species/land cover): 29.5% of the papers [27, 28, 34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73] encompassed land cover classification, specifically, forest type classification, classification of individual tree species or genus, and forest habitat mapping.

  2. 2)

    Growing stock volume / above-ground biomass: 17.7% of the papers [74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98] are studies in which data fusion was used to improve biomass estimates both at ABA and ITA levels.

  3. 3)

    Forest structure: 15.5% of the papers [11, 13, 32, 33, 99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115] include different datasets fused to improve the extraction of horizontal as well as vertical structure parameters beyond canopy height. This category includes individual tree biometric parameters such as crown diameter, crown length or base height. On an area-based level, the information derived includes mean crown length, number of vertical layers, gaps, crown coverage, stem density, basal area, DBH distribution etc. This category also includes assessment of post-fire forest structure and regeneration.

  4. 4)

    Tree height: 12.7% of the papers [116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133] include canopy height represented by different parameters such as mean height, quantiles, deviations etc. Data fusion was applied to generate better estimates of tree height at a single tree level or a stand level, mainly by fusing aerial LiDAR data with other LiDAR platforms.

  5. 5)

    Segmentation: 9.2 % of the papers [134,135,136,137,138,139,140,141,142,143,144,145,146,147] delineate individual tree crowns and identify locations of individual trees. In ABA, the segmentation includes delineation of homogeneous forest patches as well as forest stands.

  6. 6)

    Other: 9.1% of the papers [148,149,150,151,152,153,154,155,156,157,158,159,160] include a variety of applications, such as mapping the pigment distribution and quantifying taxonomic, functional, and phylogenetic diversity, tree age estimation etc.

  7. 7)

    Fuel load: 6.3% of the papers [161,162,163,164,165,166,167,168,169] include applications that deal with fuel load and forest fire modeling.

Methods for LiDAR Data Fusion

The methods used for LiDAR data fusion can generally be divided into two main categories. Data-level fusion studies typically merge datasets from different sensors during the pre-processing stage and before any formal classification or feature extraction occur, whereas feature-level fusion studies merge post-classification outputs and extracted features from disparate datasets to generate a new dataset. A third level, namely decision-level fusion, exists in the literature, but none of the papers in our literature sample fell into this category [170, 171].

Data-level Fusion

Among all papers we reviewed, 22% performed data-level fusion. Point cloud-to-cloud fusion can be achieved by combining, for example, airborne and terrestrial LiDAR datasets using the reference points acquired in both surveys [19]. TLS typically acquires detailed measurements at a plot-scale, while ULS can obtain measurements across a larger spatial extent at a landscape-scale [26]. The raw datasets can be fused using ground control points (GCPs) or by identifying similar features in the datasets [74, 100] using the same coordinate system acquired through GNSS or total stations. Other studies [26, 112, 162] used manual co-registration by identifying similar features such as the tallest tree, trees with large crowns, or tree locations. These features were used to guide the manual shifting process and to correctly co-register the two datasets. Defining appropriate key points for co-registration is challenging, especially in forest point clouds with few distinct objects, and can become even more complicated in plantation forests where trees share similar characteristics [32]. Some authors suggest using software tools to co-register point clouds based on key points [33] or the Iterative Closest Point (ICP) algorithm [140, 155, 172] in CloudCompare. The quality of the fused data depends on the forest conditions and the data characteristics, namely the number of terrestrial scans and distance of the scanners from the target [115, 173]. Another type of data-level fusion included LiDAR data fusion with spectral bands and indices, where spectral information was projected onto the point cloud [74, 113, 153] using, for example, CloudCompare [74] and FUSION software [113]. Reflective targets help the co-registration of terrestrial images and point clouds, enabling the merging of RGB pixel colors to point locations through co-registration [153].

Feature-level Fusion

A total of 78% of the papers performed feature-level fusion by merging post-classification outputs, rasterized LiDAR-derived products, extracted features, and spectral bands and indices to derive a final output. Feature-level fusion in this context can be broadly categorized into pixel-based fusion and object-based fusion [174]. Pixel-based fusion primarily occurs among airborne platforms and between airborne and satellite platforms, mostly combining LiDAR and spectral data. Many of these studies rasterized the LiDAR data to generate canopy height models (CHM) and digital terrain models (DTM) and layer-stacked these outputs with MS and HS bands as inputs for subsequent classification algorithms [28, 38, 54, 61, 149]. In most of these pixel-based fusion cases, the pre-processing takes place separately, after which they are combined. For example, hyperspectral data is processed in ENVI, while LiDAR data products are created separately. The combined data stack is then used for classifications often using machine learning methods [28]. Object-based fusion involves direct segmentation at both the individual tree scale and plot scale, followed by fusion based on various extracted features for the objects. For example, LiDAR data can be used to segment individual tree canopies, often using inverse watershed algorithms, and then features extracted from spectral data are added to those segments essentially creating a new vector-format data. The resulting spatial or vector format outputs were then used, for example, to classify tree species with machine learning methods [47, 66, 75, 102]. Most commonly, feature-level data fusion takes place in a coding environment, such as R packages to segment trees, or python for post-processing the datasets with machine learning algorithms. Readily available software solutions to process different types of data and combine the resulting features seem to be lagging behind.

Gains of LiDAR Data Fusion

To examine the gains that LiDAR data fusion brings for each of the application categories outlined above, we examined the studies that directly compared the performance of their methods with and without fusion.

Classification (Tree Species/Land Cover)

Species classification based exclusively on LiDAR data has proven effective in particular circumstances including when the set of species to be discriminated have contrasting silhouette or stature [45, 59] or when the segmentation addresses broad class separation between evergreen and deciduous species [34]. In our review, when a LiDAR dataset was compared to LiDAR fused with spectral information, overall classification accuracy increased by 41%, on average. Conversely, when they used fused datasets instead of spectral information alone, overall accuracy increased by a mere 10-14%. A few studies reported a beneficial effect of the combined use of LiDAR and spectral information by examining the importance of the various predictors in a Random Forest classification model [63]. Finally, in some cases, LiDAR only was used at the segmentation step to delineate tree crowns or stands [35, 66]. Vegetation height estimated from LiDAR data fused with MS and HS data enhances the overall accuracy of species classification [28]. However, this generally benefited object-level classification more than pixel-level classifications.

Growing Stock Volume and Biomass

Volume and/or AGB assessment requires structural and species information. While LiDAR data provide information about structure, fusion with optical data is often sought for species-specific estimates. Among the papers in this section, data fusion was performed mainly at the ITA (45%) and ABA (50%) levels, and much less at the landscape level (5%). Data fusion at tree-level mostly uses fusion of ground-based and airborne point clouds [77], addressing occlusion issues and enabling extraction of tree attributes such as DBH and total height with greater accuracy. For larger acquisitions in complex terrain, fusion of ULS, photogrammetric point clouds and MS images shows significant improvement in explained variance and error. For example, [75] fused ULS and HS data at the individual tree level, increasing the R2 from 0.75 to 0.89. In [81] (ABA), by fusing ALS and MS data, the authors reduced RMSE from 18.4% (LiDAR alone) and 19% (MS alone) to 16.8%. In [89] (ITA), by fusing RGB and MS data, the authors increased their R2 from 0.77 to 0.81. Plot-level data fusion involved predominantly airborne or spaceborne data, which allowed larger scale assessment. While fusion with ALS mostly consists of combining continuous data over the area of interest [75, 88, 94, 95], applications with spaceborne data mostly consist of upscaling approaches [76, 81,82,83, 87]. In another study [94], fusing ALS and HS data increased R2 from 0.81 to 0.87 for ITA and 0.65 to 0.84 for ABA. In [77] (ITA), the utilization of both TLS-based DBH and ULS-based tree height resulted in a reduced RMSE ranging from 8.6% to 12.7%. These RMSE values compare favorably to the RMSE values of 10.1% to 20.4% when exclusively using TLS and 30.3% to 76.9% when relying solely on ULS.

Forest Structure

The primary objective in fusing ground-based LiDAR with ULS and ALS data is to capitalize on the advantages of the ground-based LiDAR, which effectively capture the lower part of the trees, in combination with the strengths of airborne LiDAR, which accurately represent the crowns. In [26], fused TLS and ULS were used to measure tree height, crown projection area (CPA) and crown volume (CV). In estimating height, the RMSE with TLS and ULS alone was 0.30 m and 0.11 m, respectively, while the fused dataset RMSE was 0.05 m. In estimating CPA, the RMSE with TLS and ULS alone was 3.06 m2 and 4.61 m2, respectively, while the fused dataset RMSE was 0.46 m2. Finally, for CV, the RMSE with TLS and ULS alone was 29.63 m3 and 30.23 m3, respectively, while the fused dataset RMSE was 8.30 m3. Another study [32] that fused ground-based LiDAR and ULS observed significant R2 improvements in tree height (9%), stem volume (5%), and crown volume estimates (18%). In [26, 33, 112, 115], there is a strong focus on co-registration issues before individual tree parameters were extracted. Furthermore, [33] achieved enhanced accuracy for DBH measurements through TLS and ULS data fusion: 2.1% compared to TLS alone and 20.7% compared to ULS alone for DBH. [113] fused ALS and MS data and reported improved R2 when compared with ALS alone: quadratic mean diameter (from 0.5 to 0.64), basal area (from 0.53 to 0.73), tree height (from 0.92 to 0.94), stem density (from 0.29 to 0.30) and stand density index (from 0.72 to 0.82). Among the papers that use ALS and satellite data, [108] derive total volume and basal area by fusing LiDAR and topographic information (TI). Using LiDAR alone the R2 is 0.67 for volume and 0.61 for basal area, while fusion with TI increased the R2 to 0.74 and 0.69, respectively. MS-ALS-TI fusion increased the R2 further to 0.85 and 0.84, respectively.

Tree Height

For tree height estimates, 50% of the papers focus on ITA, and 50% on ABA. For example, [126] spatial resolution of tree top height estimates was improved by fusing low-density ALS data with high resolution optical images by applying k-NN technique, which allowed tree height estimates for crowns that are not represented in the LiDAR data. In this paper, it is evident that a greater number of LiDAR points associated with tree crowns enhances the accuracy of tree top height estimation. With the fusion, they detected 97% of the total trees with an estimated tree-top mean absolute error of 2.45 m (measured error with LiDAR data alone was 3.70 m). In [122], the benefit of including LiDAR-derived topographic data for estimation of canopy heights from Tandem-X InSAR data is demonstrated. Furthermore, the use of the full-resolution DTM from Land, Vegetation, and Ice Sensor (LVIS) instead of the simulated GEDI DTM significantly decreased the RMSE from 4.6 m to 3.5 m, and the bias from 1.8 m to 1.3 m.

Segmentation

In a majority of the literature reviewed, data fusion was mainly used for single tree segmentation, using airborne data [135, 138, 143]. Segmentation challenges, especially for tree-level data, include georeferencing the data products and balancing data with different spatial resolution [138]. At the single-crown level, raw point clouds or point cloud-based metrics are easier to fuse than pixel-based information [139]. The results presented by [135] show a significant difference between fused data versus ALS alone: for low-density forests, the ITA method based on ALS alone correctly detects only 63% of trees, compared to 92% when fusing data from ALS and HS. For high-density forest, fusion detects 70% of the trees compared to 62% with ALS alone. In [137, 143], the authors fused ALS and MS data increasing their segmentation by 2-4% compared to ALS alone. In [138], fusion of ALS and HS increased their segmentation by 5% compared to single sensor accuracy.

Other

The ‘other’ applications included LiDAR data fusion studies focused on wetland/marsh areas, boreal forests and a natural disaster impact assessment [155, 156, 158]. For example, [158] fused airborne LiDAR with MS imagery to assess forest loss in a wetland zone. They document that forest/non-forest classification accuracy improved from 86-87% to 91-93% demonstrating a small ~5% increase in accuracy due to the inclusion of LiDAR metrics. [155] demonstrated that their automatic ALS and TLS point cloud co-registration resulted in a denser point cloud, in which the stems and canopy of individual trees were better represented than in the single LiDAR datasets, but provided no quantitative improvement on retrieval of canopy/forest/tree information in a boreal forest. [156] developed a method to assess wind damage by fusing ALS and MS imagery. They conclude that adding the structural metrics from ALS to the spectral information provides estimates of structural damages that cannot be retrieved with spectral data alone.

Fuel Load

At a landscape-scale, multiple studies have documented that fusing ALS data with Landsat and Sentinel-2 satellite images improve total fuel estimates [168]. Specifically, [161] demonstrated that 24-32% of the remaining variability in surface fuels, uncharacterized by ALS data, can be explained by Landsat NDVI time-series. Furthermore, ALS data combined with Landsat time-series achieve both higher classification accuracy and lower prediction errors in post-fire snag classes, and shrub cover estimates [165]. Similarly, airborne MS image-derived NDVI metrics, when fused with ALS, further improved classification overall accuracy of the post-fire regeneration types at stand-scale by 10-50% [163]. Similar data fusion studies also predicted canopy fuel variables, such as canopy fuel load (kg/m2), and surface fuel layers (including coarse woody debris biomass) with adjusted R2 ranging between 0.55-0.94 [166]. At the ITA scale, post-fire changes in DBH and biomass can be estimated by fusing MLS data with ULS/ALS, where the below-canopy measurements are enabled by the MLS data [162]. However, a fusion of ALS and TLS data for ITA metrics was recently documented to offer no particular advantage over either sensor used alone [169].

Discussion

The information from the structured literature review was discussed by an international panel of experts in Leiden, the Netherlands, May 11-12, 2023. The panel consisted of 11 scientists with expertise across all LiDAR platforms and their fusion with other datasets across the full range of forestry applications.

What is ‘Data Fusion’ and How Should This Term Be Used?

Through the literature search, it became apparent that there was confusion regarding what should be considered data fusion. Specifically, we found that the terms ‘data fusion’, ‘data combination’ and ‘data integration’ are used in a confusing manner. For example, we recognize that there are studies that perform data-level or feature-level fusion without calling it as such, but instead commonly referring to it as data combination [175, 176], data registration [173] or data integration [177, 178]. However, we found that those terms are also commonly used for instances where data fusion as defined here is not actually appropriate. These include, for example, instances where one dataset is used to train a model that makes predictions based on another dataset, which would be considered calibration/validation studies [179,180,181]. We do find a few instances of those [118, 132] in our data-level and feature-level fusion examples, although there are very few of these cases.

Based on our literature review of papers that considered (multi-sensor) ‘LiDAR data fusion’, we define data- and feature-level data fusion as: the merging of data or derived features from different sources, (instruments/devices) of which at least one is LiDAR, to improve the characteristics of the LiDAR dataset and/or enable enhanced forest observations. The term ‘data integration’ can be reserved for decision-level data fusion, where datasets are only combined to come to a conclusion (decision), but they are not used to generate a new dataset or data product as inputs for classification etc [24, 182]. The term ‘data combination’ can be used to indicate the entire process that includes both data fusion starting at the pre-processing step through data integration at the decision-making step (Fig. 4).

Fig. 4
figure 4

Proposed conceptual framework defining data fusion, data integration, and data combination, which are ambiguously used in the literature

It is important to note that we only focused on multi-source data fusion, while other instances of data fusion are ignored: multi-temporal data fusion (datasets repeatedly collected at different times with the same sensor), MS-LiDAR (MS data and LiDAR collected at the same time by the same instrument), and co-registration of data from the same instrument (e.g. strip adjustment of ALS data collection and co-registration of TLS point clouds acquired from various points of view to create a forest scene). These types of fusion, though beyond the scope of this review, can still be relevant for monitoring forest growth, species categorization, identifying tree locations and could be considered by practitioners.

What are the Most Important Lessons Learned About Data Fusion in Forest Observations?

Our review indicates that all common applications are improved using data fusion. Single tree segmentation can be improved by fusing spectral or 2.5D structural information from LiDAR data, especially in low-density forests. Results obtained with canopy height model for ITA were slightly improved when LiDAR data is fused with MS images. This application is likely to be more relevant at a local scale, where detailed information about individual trees is required. In growing stock volume or above-ground biomass assessments, data fusion can improve model performance by improving tree species classification. These applications can be relevant at local to regional scales. The use of airborne and spaceborne data fusion expands the study areas to larger extents. Tree height or canopy height are correctly detected by LiDAR data alone, and there is no real need for LiDAR data fusion for further improvements, but data fusion can extend the spatial and temporal resolution of derived data products. LiDAR data fusion with spectral information, such as MS or HS data, improves tree species classification accuracy compared to using LiDAR data alone. While LiDAR alone can be effective in certain circumstances, combining LiDAR with spectral information enhances the accuracy of species classification models significantly. Fusion of ground-based LiDAR data with airborne LiDAR data improves the assessment of forest structure parameters, including tree density, crown diameter, stem density and stand volume. Fusion of ground-based and airborne LiDAR data allows the combination of strengths from both sources, capturing information above and below the canopy layer. LiDAR data fusion for fuel load estimation has been used for characterizing canopy and surface fuels. At a landscape scale, fusing LiDAR data with MS images enhances the total fuel estimates, classification accuracy of post-fire snag classes and prediction of canopy fuel variables. In summary, data fusion can further improve the accuracy of a resulting data product or application, and it can improve the spatial and/or temporal resolution of such data products, providing valuable information for practitioners. We note, though, that a lot of these gains are marginal. Therefore, it is important to further discuss the operationalization of these methods.

What are the Main Challenges in Data Fusion for Operational Applications?

We identified several challenges with operationalizing data fusion approaches. One fundamental challenge arises from the utilization of two distinct RS datasets to develop a particular solution. This necessitates acquiring multiple datasets, thereby increasing the overall cost, especially when combining data from independent acquisition platforms, such as ALS and HS data, or when dealing with large spatial extents. Although there are airborne systems available that allow simultaneous data collection from multiple sensors (e.g. LiDAR and MS image), data providers must subsequently process the acquired data, leading to additional costs. Data fusion is also a major challenge for the data user, as the effort required to process two or more RS datasets increases significantly. Consequently, separate processing steps must be developed for each dataset, increasing the overall processing time. Additionally, each step must be individually evaluated and quality-checked. To expedite processing, greater computing power becomes essential, which may be difficult to achieve, especially in practical applications. Moreover, the data processing demands specific expertise to ensure methodological correctness. Analysts may need to possess additional skills or collaborate with domain specialists to execute the analysis accurately. Both the processing time and the additional equipment and expertise required increase the cost of the analyses and can be a barrier. Another big challenge in data fusion is related to the data itself. Different data sources may have differences in resolution, accuracy, spatial or temporal coverage, which can affect the effectiveness of fusion techniques. If the quality of the data is low or the fusion process is not optimized, it might not add substantial benefits or may introduce additional uncertainties. A prevalent challenge in RS applications is the significant time lag between data collection (e.g., aerial flights) and the delivery of processed results to end users. The larger the surveyed area and the number of datasets fused, the longer it takes. IT also requires more validation and more rigorous accuracy assessment, which often reveals further deficiencies and errors that need to be addressed. This delay in information provision may render the data obsolete or limit its effectiveness in addressing situations with rapidly changing events, such as insect outbreaks or areas impacted by severe wind/fire damage.

What are the Priorities in Moving Data Fusion Forward?

We find that the RS community can further advance LiDAR data fusion enabling a wider range of applications from environmental monitoring and resource management to disaster responses. Several key areas should be a priority in propelling the applications and methodologies of LiDAR data fusion forward. First, our structured review shows that more studies on LiDAR data fusion are needed in the southern hemisphere to better understand the limitations and advantages of such applications in the extensive rainforests in the global south, which have been relatively underexplored compared to the northern hemisphere. The underrepresentation from the global south has important implications because these regions include a large majority of the tropical forests, where LiDAR fusion may have many benefits. For example, tropical forests typically include tall trees with several middle and understory layers of dense canopies, where TLS data fused with ALS data could fully characterize the forest structure. Secondly, even though improvements using data fusion for a variety of applications have been reported, compared to using LiDAR data alone, it is yet unclear to what extent these could be operationalized in a forestry setting. More information is required to properly balance the costs of additional data collection and processing, and the required expertise versus the benefits in accuracy or spatial and temporal resolution. Common data formats with metadata standards need to be established to develop interoperable algorithms among researchers to facilitate collaborations. As an example, variables that can be extracted from ALS point clouds are infinite and standardizing these variables is always a challenge. In [183], the authors suggested a list of 10 standard variables within 3 main classes (height, vertical variability, and cover) as a starting point to characterize the vegetation structure. Moreover, in [184], the authors recommend metrics such as the skewness or kurtosis or the coefficient of variation of vegetation height to describe vegetation structures. Both papers proposed that the data be made available in raster format to standardize subsequent studies or operations. Addressing sensor-specific biases, radiometric differences, and geometric distortions across different data sources is essential to harmonize fused datasets effectively. Moreover, it is necessary to develop robust methods to quantify and address uncertainties in data fusion processes, which will boost confidence in the final products. A rigorous validation and benchmarking of data fusion approaches with ground-based accuracy assessment and independent datasets are crucial. Finally, LiDAR data fusion studies should promote open data initiatives and foster collaboration among researchers, institutions, and data providers. This would facilitate access to diverse datasets and accelerate data fusion research, which will further enable data fusion methods and solutions that can operate in real-time especially for applications requiring quick and up-to-date information.

Conclusion

This paper presents a comprehensive review of LiDAR data fusion research for forest observations over the last decade. Our structured review indicates that there has been a slight upward trend in the number of publications on LiDAR data fusion for forestry observations and aerial platforms (both UAVs and airborne platforms) continue to be the most widely used option. We conclude that multi-sensor LiDAR data fusion has the potential to improve forest observations in a great variety of applications. Our team suggests a clear definition of the term “data fusion” to avoid confusion among the commonly used terms ‘data fusion’, ‘data combination’, and ‘data integration’. The review further highlights that data fusion poses several challenges, including costs, computational effort, and processing times, variability in data quality, spatial resolution, and a need for specialized expertise. Therefore, practitioners must carefully weigh the potential benefits of LiDAR data fusion in relation to the actual need for such benefits and the accompanying cost.