1 Introduction

Diameter at breast height (DBH, 1.3 m above ground level) is the explanatory variable most commonly used in single- and multiple-entry equations to predict tree-level attributes, mainly because it is easy to measure in the field and is strongly related to many forest variables (Burkhart and Tomé 2012). The empirical diameter distribution (specified by the DBH measurements within the stand) is one of the most descriptive and important characteristics for forest managers because it provides information about stand structure and inputs for forest growth models and enables economic assessment of timber value and development of management schedules (Bollandsås and Næsset 2007; Kangas et al. 2007; Pascual et al. 2013).

Diameter distributions can be represented using a discrete density (or frequency) histogram, a continuous probability density function (PDF) (or the equivalent cumulative distribution), or a list of quantiles or percentiles. In forestry practice, DBH measurements are not always available, and the diameter distribution must therefore be predicted by using stand attributes as explanatory variables (e.g., density, site index, age, mean tree size), usually under the assumption that it follows a specified theoretical PDF (Liu et al. 2004; Maltamo and Gobakken 2014). Although numerous PDFs have been used to describe unimodal diameter distributions (e.g., Charlier, Normal, Log-normal, Exponential, Beta, Gamma, Pearl-Reed, SB Johnson, Weibull), the Weibull function is the most frequently used for managed, even-aged stands (e.g., Poudel and Cao 2013). Specifically, the two-parameter formulation of the Weibull function has proven simple to use, yet flexible enough to describe different shapes of unimodal distributions (e.g., Maltamo et al. 1995; Gorgoso et al. 2007).

Two parametric methods are available for predicting diameter distributions from field-derived stand variables: (i) the parameter prediction method (PPM), which directly models the PDF parameters as a function of stand variables, and (ii) the parameter recovery method (PRM), which recovers the PDF parameters from moments (moment-based PRM) or percentiles (percentile-based PRM) of the diameter distribution, which are expressed as functions of stand-level attributes (Hyink and Moser 1983). The moment-based PRM is usually preferred because it guarantees that the sum of the disaggregated stem density and basal area obtained by the Weibull function equals the stand stem density and stand basal area, respectively, resulting in numerical compatibility (e.g., Hyink and Moser 1983; Siipilehto and Mehtätalo 2013).

In the last 20 years, airborne LiDAR has been increasingly used for forest inventories at different scales (Yu et al. 2011), because of its capacity to provide spatially explicit detailed three-dimensional information about the size and structure of the forest canopy over entire areas (Reitberger et al. 2008; Wagner et al. 2008). Canopy cover and tree height are the variables most closely related to LiDAR data, as LiDAR data is mainly affected by the vertical distribution of the canopy layers (Maltamo and Gobakken 2014). However, LiDAR metrics are also related to characteristics of the diameter distributions (Maltamo and Gobakken 2014).

Three approaches have been considered for estimating the diameter distribution from LiDAR data, within the framework of parametric prediction (Maltamo and Gobakken 2014). The first uses regression analysis to directly relate LiDAR metrics to the PDF parameters (e.g., Breidenbach et al. 2008; Thomas et al. 2008) or the moments or percentiles of the diameter distribution, which are then used to recover the PDF parameters (e.g., Gobakken and Næsset 2004, 2005; Bollandsås and Næsset 2007). The second approach considers modeling the PDF parameters from stand-level variables predicted using area-based LiDAR metrics (e.g., Maltamo et al. 2006, 2007; Holopainen et al. 2010). It requires two equations: one to relate the stand variables to LiDAR metrics and another to relate the PDF parameters to the estimated stand variables, implying model error accumulation and cross-correlated residuals. The third approach predicts the diameter distribution on the basis of recognition of individual trees (Hyyppä and Inkinen 1999; Persson et al. 2002; Villikka et al. 2007), which requires high pulse densities (usually more than 5 pulses m−2: Bollandsås and Næsset 2007). In this case, only the dominant tree layer is usually detected (Næsset et al. 2004); for the tallest trees, the DBH for a given height is more variable and the relationship between DBH and height is weaker (Maltamo et al. 2004) and affected by site factors (Maltamo and Gobakken 2014). Because the first approach does not have any of the above disadvantages, and as Peuhkurinen et al. (2011) have demonstrated its superiority for predicting diameter distributions in even-aged stands, we selected this approach for use in the present study.

We aimed to predict the diameter distributions in Pinus radiata D. Don plantations in Galicia (NW Spain) by using height and canopy cover LiDAR metrics from a small-footprint, discrete-return system. For this purpose, we used the moment-based PRM in combination with the two-parameter Weibull function. LiDAR metrics were obtained from low-density LiDAR data provided by the Spanish countrywide PNOA (Plan Nacional de Ortografía Aérea, www.pnoa.ign.es) project.

2 Material and methods

2.1 Study area and data

The study was conducted over the main distribution area of P. radiata in Galicia (NW Spain), i.e., the province of Lugo (Fig. 1). The forests under study are representative of P. radiata stands in NW Spain and are thus mainly characterized by high planting-density, low-intensity silvicultural treatments and the presence of moderate shrub fuel loads (Castedo-Dorado et al. 2012).

Fig. 1
figure 1

Maps showing the locations of inventory field plots. a The presence of Pinus radiata in Galicia (source: Fourth Spanish National Inventory) and administrative boundaries of the provinces included in the region of Galicia. b Centroids of the 25 field plots established in the province of Lugo

The field data used for modeling the diameter distributions were obtained from two different sources. The first source (A) comprises a network of 10 rectangular plots (600 to 1000 m2 in size, depending on stand density) established for growth modeling purposes. The inventory design was thus focused on obtaining an adequate representation of the existing range of ages, stem densities, and site indices (for details, see Castedo-Dorado et al. 2007). The second source (B) comprises 15 rectangular plots (1000 m2 in size) established for assessing the influence of thinning treatments in crown fire potential. The inventory design was deliberately focused on representing young and highly stocked stands, as these are usually fire prone (see Gómez-Vázquez et al. 2013, for details). Although both networks of plots cover a larger area, only the abovementioned 25 plots were selected for this study because they were re-measured close to the PNOA LiDAR flight date.

For all plots, DBH and total tree height were measured in all trees with a caliper and a Vertex III hypsometer, respectively. In addition, the UTM coordinates of the four corners of each plot were obtained from topographic surveys by using a total station and a differential GPS.

The following stand variables were calculated for each plot: stand density (N, stems ha−1), stand basal area (G, m2 ha−1), average height (Hm, m), dominant height (H, m, defined as the average height of the 100 largest-DBH trees per hectare), site index (S, defined as the dominant height at a reference age of 20 years, using the height growth model developed by Diéguez-Aranda et al. 2005), arithmetic mean diameter (dm, cm), and quadratic mean diameter (dg, cm). Stand age (t, years) was determined from the plantation date. Additionally, the empirical diameter distribution was obtained for each plot from field measurements, as well as the empirical weighted distributions for tree basal area (g-weighted) and tree volume (v-weighted). While g was calculated straightforwardly from the diameter measurements, tree volume had to be estimated from the diameter by predicting height with a height–diameter model (Castedo Dorado et al. 2006) and subsequently estimating the volume with a stem taper function (Diéguez-Aranda et al. 2009).

Table 1 shows the summary statistics of the tree and stand variables. Figure 2 shows a scatter plot matrix for the variables t, N, H, and S, which reveals that field data cover the entire duration of stand development, considering the rotations usually applied to this species in Galicia (on average around 25 years in private forests; Rodríguez et al. 2002).

Table 1 Summarized field and LiDAR data for the 25 sample plots used
Fig. 2
figure 2

Pairwise scatter diagrams of age (t), stand density (N), dominant height (H), and site index (S) for the field plots. Filled circles and empty circles represent data from sources A and B, respectively (see text for details)

LiDAR data were acquired for the PNOA project under the direction of the Spanish Ministry of Development (Dirección General del Instituto Geográfico Nacional and Centro Nacional de Información Geográfica), between 5 September and 29 October 2009, with an RIEGL LMS-Q680 sensor, operated at 1064 nm, pulse repetition rate of 70 kHz, scan frequency of 46 Hz, maximum scan angle of ± 30°, and average flying height of 1300 m above sea level. A maximum of 4 returns per pulse were registered, with a theoretical laser pulse density required for the PNOA project of 0.5 first returns per square meter. Summary statistics of the LiDAR return density per square meter within the plots are shown in Table 1.

2.2 Extraction of LiDAR metrics

We used FUSION V. 3.50 software (McGaughey 2015) to filter and interpolate the data and generate the digital elevation model (DEM) and normalized height of the LiDAR data cloud (NHD). We used LiDAR data within the limits of the 25 field plots to calculate metrics related to the height distribution and canopy closure using the returns from above 1 m, following the steps described in González-Ferreiro et al. (2017) (see Table 2 for details of the LiDAR-derived metrics).

Table 2 Potential explanatory variables related with height distribution and canopy closure

2.3 Method of moments for recovery of Weibull parameters

The PDF of the two-parameter Weibull, considering x as a continuous random variable, is expressed as follows (Bailey and Dell 1973):

$$ f(x)=\left(\frac{c}{b}\right)\cdot {\left(\frac{x}{b}\right)}^{c-1}\cdot \exp \left(-{\left(\frac{x}{b}\right)}^c\right) $$
(1)

where, the f(x) value represents the density function for x, b is a scale parameter, and c is a shape parameter.

In the method of moments, the parameters of the Weibull density function are recovered from the first two moments of the diameter distribution: arithmetic mean diameter (dm) and diameter variance (σd2) (Newby 1980; Burk and Newberry 1984). Thus, the following expressions were used to recover parameters b and c:

$$ {d}_{\mathrm{m}}=b\cdot \varGamma \left(1+\frac{1}{c}\right) $$
(2)
$$ {d}_{\mathrm{g}}^2=\frac{d_{\mathrm{m}}^2}{\varGamma^2\left(1+\frac{1}{c}\right)}\cdot \varGamma \left(1+\frac{2}{c}\right) $$
(3)

where dg is related to the second moment of the diameter distribution through the expression

$$ {\sigma_{\mathrm{d}}}^2={d_{\mathrm{g}}}^2-{d_{\mathrm{m}}}^2 $$
(4)

and Γ(i) is the Gamma function for i, where i is the variable on which the function depends.

When using stand-level field inventory data, the recovery procedure relies on dm being estimated from dg and other stand-level attributes (e.g., t, N, H, S); dg, in turn, can be directly calculated from G and N. Within the LiDAR data framework, these stand-level variables can be replaced by LiDAR metrics as explanatory variables to estimate dg and dm. Once these variables are estimated, parameters b and c can be obtained by solving the system of Eqs. 2 and 3.

We have also considered another option whereby parameters b and dg were modeled through LiDAR metrics, and parameter c was then recovered. However, as poorer results were obtained than with the abovementioned methodology, this option was ruled out for further analyses.

2.4 Regression models

We used a linear model to establish the empirical relationship between dg and LiDAR metrics:

$$ {d}_{\mathrm{g}}={\alpha}_0+{\alpha}_1{X}_1+{\alpha}_2{X}_2+\dots +{\alpha}_n{X}_n+\varepsilon $$
(5)

where X1, X2, …, Xn are potential explanatory variables related to the LiDAR-derived height distribution and canopy closure (Table 2); α0, α1, …, αn are the parameters to be estimated in the fitting process; and ε is the additive error term, which is assumed to be independent and normally and identically distributed with zero mean.

For a given stand, dm is always smaller than or equal to dg, and we therefore used the following model expression to take this restriction into account (Frazier 1981):

$$ {d}_{\mathrm{m}}={d}_{\mathrm{g}}-\exp \left({\beta}_0+{\beta}_1{Y}_1+{\beta}_2{Y}_2+\dots +{\beta}_m{Y}_m\right) $$
(6)

Finally, we applied a natural logarithmic transformation to Eq. 6 to linearize the model and facilitate selection of the independent variables:

$$ \ln \left({d}_{\mathrm{m}}-{\mathrm{d}}_{\mathrm{g}}\right)={\beta}_0+{\beta}_1{Y}_1+{\beta}_2{Y}_2+\dots +{\beta}_m{Y}_m+\varepsilon $$
(7)

where Y1, Y2, …, Ym are potential explanatory variables related to the LiDAR-derived height distribution and canopy closure (Table 2); β0, β1, …, βm are the parameters, and ε as aforementioned.

2.5 Model fitting and selection

In the first step, we applied the stepwise selection procedure to select the best subset of independent variables to be included in Eqs. 5 and 7. We used a combination of forward and backward algorithms for variable selection implemented in the regsubsets function, of the leaps package (Lumley and Miller 2017) of the R statistical software (R Core Team 2016). We selected those models with the lowest values of the Bayesian information criterion (BIC: Schwarz 1978), with no problems related to multicollinearity between explanatory variables (i.e., those with a condition index below 30; Belsley 1991) and with all parameter estimates significant at the 5% level.

In the second step, we fitted the system of two equations (Eqs. 5 and 6), considering LiDAR metrics selected in the first step as exogenous variables (i.e., obtained outside the system) and dm and dg as endogenous variables (i.e., variables that the model is intended to predict; Borders 1989). As the endogenous variable dg occurs on both sides of the equations, cross-equation correlation between error components is expected. Therefore, biased and inconsistent parameter estimations would be obtained using the ordinary least-squares technique (Borders and Bailey 1986; Borders 1989). Accordingly, the system of equations was fitted simultaneously by a three-stage least-squares method (3SLS: Zellner and Theil 1962), which combines two-stage least squares (2SLS) with seemingly unrelated regression (SUR), taking the cross-equation error correlations into account. For this purpose, we used the nlsystemfit function of the systemfit package (Henningsen and Hamann 2007) of R (R Core Team 2016). We used the coefficient of determination (R2) and the root mean square error (RMSE) to evaluate the goodness of fit of the models.

2.6 Accuracy assessment

We applied the Kolmogorov–Smirnov (KS) test, which compares theoretical and empirical (field-observed) diameter distributions, to assess the suitability of the two-parameter Weibull function for predicting the diameter distribution from the moment-based PRM and LiDAR metrics. As the diameter distribution parameters are estimated from empirical information (LiDAR data), the estimated distribution is not theoretical. For this case, Lilliefors (1967) stated that the KS statistic existing distribution is no longer valid and should be obtained by Monte Carlo simulation. Therefore, for each plot, we generated 10,000 independent identically distributed pseudo-random samples under the null hypothesis: we used the rweibull function of R (R Core Team 2016) to generate random samples with a size equal to the number of observations of the corresponding plot, and with recovered parameters (from field or from LiDAR information), computing then the KS statistic for each sample. This subsequently enabled estimation of the distribution of the KS statistic under the null hypothesis for each plot. If the KS statistic value obtained from the comparison between the estimated and empirical distribution of a plot exceeds the critical value at a specified significance level (obtained from the approximated distribution of the KS statistic), the hypothesis that the observations belong to a Weibull distribution of the specified parameters should be rejected. The significance level was established at 5%.

In addition, the performance of the methodology was also evaluated on the basis of numerical and graphical analyses. We used the former type of analysis to assess the RMSE obtained for prediction of DBH, G, and V from predicted diameter distributions (from field variables and LiDAR metrics). The predicted variables were obtained using the following procedures: (1) for DBH, we predicted the diameter values of a plot by applying the inverse of the diameter distribution function (i.e., the quantile function) over the empirical distribution function values; (2) for G, we integrated the diameter density function multiplied by squared diameter (to obtain the expected value of quadratic mean diameter), subsequently obtaining G by direct calculation from dg and N; and (3) for V, we integrated the diameter density function multiplied by the tree volume of the corresponding diameter, scaled by the number of trees per hectare. For graphical analysis, we plotted the unweighted and weighted predicted diameter distributions (from field and LiDAR variables) against the empirical diameter distributions, for visual assessment of the prediction accuracy.

2.7 KS acceptance/rejection prediction

After applying the KS test and comparing the estimated unweighted and weighted distributions with the empirical distributions, we used a classification tree to relate the result of the KS test for each plot (null hypothesis accepted or rejected) with the measured stand-level attributes N, G, H, V, S, t, dm, and dg. The aim of this analysis is to search for common properties in accepted and rejected plots. If any patterns are observed, they could be used in field data stratification, thus increasing the efficiency of the diameter distribution modeling approach (e.g., Thomas et al. 2008).

Moreover, we used the random forest (RF) approach to examine the influence of stand variables on the suitability of the two-parameter Weibull PDF for characterizing the empirical diameter distribution. The relevance of each stand variable in RF was calculated by analyzing the changes in the classification error when the values of the variable are randomly permuted; if the effect is large, the variable is assigned greater importance (Reif et al. 2006). We implemented the classification tree analyses and RF using the R software packages rpart (Therneau et al. 2017) and randomForest (Liaw and Wiener 2002) (R Core Team 2016).

Data availability

LiDAR data is freely available at http://mapas.xunta.gal/visores/descargas/ and http://centrodedescargas.cnig.es/CentroDescargas/buscadorCatalogo.do?codFamilia=LIDAR. Field datasets generated and analyzed during the current study are not publicly available due to authors are still using them in other research activities, but they are available from the corresponding author on reasonable request.

3 Results

Table 3 summarizes the parameter estimates and goodness-of-fit statistics of the simultaneous fitting of the system of Eqs. 5 and 6. Note that in Eq. 5, the intercept was not included because it was not significant at a 95% confidence level. The fitted models explained 80 and 77% of the observed variability in dg and dm, respectively, with RMSE values of 3.422 and 3.626 cm for the same variables.

Table 3 Parameter estimates and goodness-of-fit statistics of the simultaneous fitting of Eqs. 6 and 7 used to estimate dm and dg from LiDAR data

Table 4 shows the plots with diameter distribution adequately estimated according to the KS test results. The percentage of plots in which the null hypothesis was accepted (i.e., the estimated distribution is equal to the empirical distribution) varied between 28 and 40% (7, 9, and 10 plots for the unweighted, g-weighted, and v-weighted distributions, respectively; see Table 4) when we used LiDAR data to estimate dg and dm, subsequently recovering the distribution parameters. In six plots, the null hypothesis was accepted for all unweighted and weighted diameter distributions estimated from LiDAR data, while it was rejected for all cases in 14 plots. Comparatively, the percentage of acceptance increased up to 96–100% when we used field data (i.e., real values of dg and dm) in the parameter recovery process (the only case of rejection was plot number 8 for the unweighted diameter distributions).

Table 4 Results of the Kolmogorov–Smirnov test (α = 0.05) for the 25 plots

Figure 3 shows the v-weighted observed cumulative relative frequency of each plot and the corresponding estimated distributions obtained from parameters recovered using field data (real information) and LiDAR data (the corresponding graphs for the unweighted and g-weighted distributions are included in Supplementary Figure 1a and b, respectively). We can observe that the empirical distribution is adequately described for the six plots that passed the KS test in all cases (see Table 4). On the other hand, the 14 plots where null hypothesis was always rejected usually display bias at a coarse scale.

Fig. 3
figure 3

Plots of cumulative relative frequencies against diameter at breast height (DBH) for v-weighted diameter distributions. The continuous lines represent field measurements (empirical distribution); the dashed lines represent the diameter distribution function estimated from field data; and the filled dots represent the diameter distribution function estimated from LiDAR data

Comparison between empirical distributions and the estimated diameter distributions using LiDAR metrics as predictors revealed RMSE values of 10.85 and 96.93 m3 ha−1 for the g- and v-weighted distributions, respectively. However, RMSE values < 0.001 and 17.36 m3 ha−1 were obtained directly from estimation from the field data for the same weighted distributions.

Because the best results of the KS test were obtained for the estimated v-weighted distributions, the classification tree was fitted for the groups obtained for these diameter distributions (15 plots were rejected and 10 plots were accepted). The results showed that H was the only predictor with a threshold of 22.6 m. Sample plots with values of H equal to or higher than 22.6 m were considered accepted by the KS test; otherwise, they were considered rejected. Application of this threshold yielded correct classification of 84% of the sample plots. These results are consistent with those obtained with the RF approach, in which the two most important stand variables were dominant height (H) and stand density (N).

4 Discussion

Many countries throughout the world have completed countrywide airborne LiDAR surveys in recent years (e.g., Denmark, Kortforsyningen; Finland, National Land Survey of Finland; Netherlands, Actueel Hoogte Bestand Nederland; Slovenia, Slovenian Environment Agency; Spain, Instituto Geográfico Nacional; Switzerland, Federal Office of Topography), with the main aim of producing high-resolution terrain maps (Ahokas et al. 2005). However, the flight parameters used in these surveys are not usually considered as optimal for quantifying natural resources, since low density, high flight height, and large scan angle are used in order to reduce the associated costs (González-Ferreiro et al. 2014). Nevertheless, this type of data has also proven useful for forest inventories (e.g., Villikka et al. 2012), practical forest management (e.g., Valbuena et al. 2016), and ecological applications (e.g., Vihervaara et al. 2015), among others.

In Spain, the low-density LiDAR data obtained in the PNOA project have proved useful for assessing numerous forest variables such as stand volume (Guerra-Hernández et al. 2016b), stand basal area (Guerra-Hernández et al. 2016b), Lorey’s mean height (González-Ferreiro et al. 2014; Guerra-Hernández et al. 2016b), canopy fuel variables (González-Ferreiro et al. 2014, 2017), fire severity (Montealegre et al. 2014), and biomass (Guerra-Hernández et al. 2016a, b). However, LiDAR data have not yet been used to estimate diameter distributions.

In this study, we first had to model dg and dm from LiDAR metrics, as these variables are needed to recover the parameters of the two-parameter Weibull function. The goodness-of-fit statistics obtained in the fitting phase were similar to others reported in the international literature. For example, for German forests dominated by Picea abies (L.) Karst., Breidenbach et al. (2008) used data from a 0.44 pulse m−2 LiDAR flight and reported an RMSE of 2.44 cm for dm, while Treitz et al. (2012) studied a broad range of forest types (coniferous and hardwoods) and conditions across Ontario by using artificially reduced LiDAR database of 0.5 pulses m−2 and reported RMSE values ranging from 0.76 to 4.3 cm for dg.

Conversely, the comparison between the estimated and the observed diameter distributions provided less satisfactory results, as the null hypothesis was accepted in only 40% of the plots. There are two possible explanations for these poor results: the use of the KS test for accuracy assessment and the characteristics of the countrywide PNOA LiDAR flight. Concerning the first possibility, most studies that have modeled diameter distributions from LiDAR data have reported the standard deviation of the differences between estimated and empirical values (e.g., Gobakken and Næsset 2004) rather than the KS test results. In this sense, Magnussen and Renaud (2016) used multidimensional scaling to estimate diameter distribution and considered differences less than 2.70 cm in diameter estimation as not relevant for practical applications, downplaying rejection rates in the KS test. In the present study, computation of the differences between estimated and empirical values revealed that the mean error for individual tree diameter estimations was 3.37 cm for the unweighted distribution predicted from LiDAR data. This error appears acceptable, considering the precision obtained from field measurements with, e.g., the laser relascope (up to 1.6 cm, Kalliovirta et al. 2005) or a laser dendrometer (up to 0.9 cm, Parker and Matney 1999). Although these errors are proportionally smaller than those reported in the present study, field-measured diameters are usually obtained from a small number of sample plots, and larger errors are expected in the extrapolation process, while LiDAR data allows complete coverage of the area of interest. Concerning the countrywide PNOA LiDAR data used, the acquisition flight was not specifically designed for forest inventory purposes (scanning angles of up to 30°, low-density data of 0.5 first returns m−2, and high average flying height of 1300 masl). According to White et al. (2013), a minimum of 1 pulse m−2 (> 4 pulses m−2 for dense forests on complex terrain) is recommended to produce an operational LiDAR-based enhanced forest inventory. Other possible explanations could be related to the lack of silvicultural treatments and with the morphology of radiata pine. P. radiata is a shade-intolerant conifer species and has a much lower crown morphological plasticity in relation to light availability than other more shade-tolerant species such as spruce (Parent and Messier 1995), which means that the plastic changes in the canopy dimensions do not follow a regular pattern of growth and this species is less efficient in covering the open holes in the canopy. This could have some implications in area-based approaches, especially when using very low LiDAR density.

Regarding the possible alternatives to the parametric PRM, non-parametric approaches such as the k-nearest neighbor method and RF regression have yielded satisfactory results for predicting diameter distributions in some area-based LiDAR studies (Packalén and Maltamo 2008; Maltamo et al. 2009; Shang et al. 2017). Nevertheless, the relatively small number of sample plots used in the present study may have led to overfitting, which could negatively affect the predictive performance of the models (Hawkins 2004; Zaffalon 2005).

According to the KS test, the best performance was achieved with g- and v-weighted diameter distributions, which gave greater weighting to larger and more valuable trees. This is consistent with the findings of Gobakken and Næsset (2004), who reported that the results obtained for diameter density function weighted by g were better than those corresponding to the unweighted distribution.

Explanatory variables for dg were LRA1 and h75, while the unique explanatory variable for dm was h01. LiDAR variables are usually classified as height metrics expressing the central trend in LiDAR height distribution (e.g., hmean and hmode), height metrics expressing the dispersion of LiDAR height distribution (e.g., hSD, hVAR, hAAD, hIQ, hCV, hmax, hmin), height metrics expressing the shape of LiDAR height distribution (e.g., hSkw, hKurt), percentiles of the LiDAR height distribution (e.g., h01, h10…, h95, h99), and canopy cover metrics (e.g., PFRAhmean, PFRAhmode, PARAhmean, PARAhmode, PFRA2, LRA1, FRA1, CRR). The highest percentiles of the LiDAR height distribution are commonly included as explanatory variables in most of the models based on area-based approach, because they fairly describe the mean height or the dominant height, and these field variables have a large importance in the description of vertical structure of a forest stand. Canopy cover metrics perform well in modeling those forest variables related with forest density, like basal area, quadratic mean diameter, canopy cover, or number of trees per hectare; these field variables are important in the description of the horizontal structure of a forest stand. Thus, the presence of explanatory variables like LRA1 and h75 was expected. The inclusion of h01 is more difficult to explain, since it is one of the lowest percentiles of the LiDAR height distribution; nevertheless, Gobakken and Næsset (2004, 2005) also found strong relationships between the percentile of the first pulse laser canopy heights for 0% (m) and the logarithmic transformations of 93 and 24 percentile for diameter distribution modeling, in young and mature forests dominated by P. abies and Pinus sylvestris L.

Finally, the results of the classification tree and the RF approach indicated that dominant height may be a significant variable for field data stratification. It is remarkable that 84% of the sample plots with values of H equal to or higher than 22.6 m were accepted by the KS test which suggests that this methodology performs much better in mature forest stands of P. radiata. Nonetheless, these results should be considered with caution, due to the low number of plots available for this study. The validity of these results should be verified when new data become available.

5 Conclusion

This study describes and tests an approach for predicting diameter distributions from LiDAR information, applied to a set of 25 plots established in P. radiata plantations in Galicia (NW Spain). Given the interest of the species in Galicia (NW Spain), forest managers and practitioners could benefit from better knowledge of diameter distributions for use in practical forest applications. The approach used was based on the moment-based parameter recovery method and the two-parameter Weibull function. The results showed that only 28 to 40% of the diameter distributions were accurately predicted with this method, according to the KS test. However, this may be explained by the strictness of this statistical test and/or by the characteristics of the LiDAR flight, which was not specifically designed for forest inventories. Analysis of why the null hypothesis was accepted or rejected by a classification tree and random forest approach demonstrated the importance of dominant height. The approach described has the potential to be replicated in different species and forest types in regional or countrywide surveys, although a larger number of field inventory plots would be required. In this respect, contracts for new PNOA flights have already been tendered, and the temporal cover provided by these flights has been established at 6 years. This should encourage new research studies, given the growing interest in the use of the Spanish countrywide LiDAR dataset for performing repeated large-scale forest inventories in an economically viable way.