1 Introduction

Disturbances occurring in forest ecosystems are one of the most important determinants of spatio-temporal development in stands (Gratzer et al. 2004). Due to disturbances of different spatial scales, gaps of varying sizes are formed. These processes have a significant effect on the structure of forests. The specific vertical structure is closely related to the shape of the diameter at breast height (DBH) distribution (Lawton and Putz 1988; Denslow et al. 1998). Many tree stands in various geographic regions contain cohorts of old trees, which are represented by only few individuals, but play a great role in stand structure and in ecosystem functioning. It is difficult to find distribution functions to represent these few large trees. During the approximation of these highly skewed and heavy-tailed DBH distributions, there is often the smoothing problem, which in turn requires the use of methods that are able to fit a tail probability well.

Different models have arisen naturally across a range of problems when modelling DBH in forestry (e.g. Pretzsch 2010). Single flexible theoretical distributions (e.g. Weibull, gamma) have often been used to fit empirical DBH data more or less asymmetrically with a positive skewness (Merganič and Sterba 2006; Gove et al. 2008). Mixture distributions with a few components are an appropriate tool for modelling bi- and multimodal empirical DBH distributions (Zhang et al. 2001; Zasada and Cieszewski 2005; Podlaski 2011a, b; Zasada 2013).

In order to model the dynamics of forest stands with cohorts of old trees, new types of distribution functions are needed. A new approach for density estimation of highly skewed and heavy-tailed distributions, the gamma shape mixture (GSM) model, employs a mixture of gamma density functions with unknown weights (Venturini et al. 2008). A general Bayesian approach allows the creation of a flexible model characterised by a single parameter for all the gamma components and the ordinary set of mixture weights (Jasra et al. 2005; Venturini et al. 2008). This method significantly improves predictive performance in estimating tail probabilities compared to standard approaches employing e.g. single flexible theoretical distributions and mixture distributions with a few components (Venturini et al. 2008). A particularly important advantage of the GSM model is the possibility to use a great number of mixture components. In the case of two-generation stands where the two generations significantly differ in the number of trees, the model makes possible, among other things, to generate random DBH data, taking into account the existence of small local DBH maxima. These maxima, representing the older generation and creating longer-than-normal right tails, cannot be treated as atypical observations. The data are indispensable to correctly present DBH distributions in the case of two-generation stands, in which the older generation is formed by single old trees. Thereby, proposals that overcome the problem of atypical observations in distributions (e.g. by their identification and next, elimination) cannot be used.

In ecology, for analysis of forest dynamics, based on simulation studies, one should use data sets (mainly DBH) characterising the investigated stand in particular developmental phases. Tree lists, minimally a set of DBHs with an indicator of tree species, obtained from measurements made in selected plots are used to define the initial condition. The measurement of large DBH samples is practically impossible in many forest inventories due to economic limitations (e.g. Roesch et al. 2015). In this case, the DBH distributions can be generated using theoretical functions (e.g. Thompson 2000, Gehringer and Turnblom 2014).

Forest growth models based on progressing distributions are characterised by the inclusion of stand heterogeneity in the simulation approach, providing information on tree dimensions (e.g. Porté and Bartelink 2002). The accuracy of such models is primarily determined by the flexibility of the underlying type of theoretical function (e.g. Pretzsch 2010). Stand development is presented as a periodic progression of the frequency distributions. Each developmental phase is represented by a theoretical function of specified parameters. By changing these parameters, the BDH distribution can be shifted along the time axis. The DBH data generation makes it possible to increase the number of DBHs for small samples and then allows comparison of the model outputs with independent data.

Procedures based on Markov chain Monte Carlo (MCMC) techniques are frequently used methods for generating random numbers from probability distributions (Liu 2001). The Monte Carlo methods have become one of the most important tools to sample from complex distributions (e.g. Liu 2001; Robert and Casella 2004). There have been several classes of Monte Carlo techniques, e.g. MCMC techniques with Metropolis–Hastings sampling, sequential Monte Carlo techniques that include for example sequential importance resampling or particle filtering (Kong et al. 1994) and recent development of methods with equi-energy sampling (Kou et al. 2006).

The aims of this study are (1) to compare the precision of the approximation of empirical DBH data employing the GSM model and kernel density estimation (parametric and non-parametric methods) and (2) to assess the suitability of two methods for generating random DBH data from the GSM model: (a) the procedure using a multimodal distribution and gamma random numbers and (b) MCMC techniques with Metropolis–Hastings sampling. The GSM model has not been previously used for the analysis of forest data.

2 The gamma shape mixture model

The GSM model is defined as follows (Lehmann and Casella 1998; Venturini et al. 2008):

$$ f\left( x|{\pi}_1,\dots, {\pi}_J,\theta \right)=\sum_{j=1}^J{\pi}_j{f}_j\left( x|\theta \right) $$
(1)

where J is the number of mixture components (known and fixed), π 1, ..., π J are mixture weights (proportions) (unknown) and 1/θ is the scale parameter for the whole GSM model (unknown). The gamma distribution f j (x| θ) has a probability density function (PDF) given by

$$ {f}_j\left( x|\theta \right)=\frac{\theta^j}{\Gamma (j)}{x}^{j-1}{e}^{-\theta\;x} $$
(2)

Each gamma distribution in the GSM model is indexed by a component-specific shape parameter (j) and has a single scale parameter (1/θ).

The GSM model could also be defined as follows (Venturini et al. 2008):

$$ p\left({x}_1,\dots, {x}_n|{z}_1,\dots, {z}_n,\theta \right)=\frac{\theta^{\sum_{i=1}^n{z}_i}}{\prod_{i=1}^n\Gamma \left({z}_i\right)}\left(\prod_{i=1}^n{x}_i^{z_i-1}\right){e}^{-\theta \sum_{i=1}^n{x}_i} $$
(3)

where z 1, ..., z n are the missing elements of the sample (Dempster et al. 1977; Diebolt and Robert 1994). Given x 1, ..., x n , an integer z i between 1 and J could be associated to each x i that identifies the component of the mixture generating observation x i ; this auxiliary variable z i identifies to which component the observation x i belongs.

A general Bayesian approach for estimating the unknown parameters of the GSM model is often used. The π 1, ..., π J and θ are independent a priori and the following conjugate prior distributions are specified (Venturini et al. 2008):

$$ {\pi}_1,\ldots,{\pi}_J\sim {D}_J\left(\frac{1}{J},\dots, \frac{1}{J}\right) $$
(4)

and

$$ \theta \sim G\left(\alpha, \beta \right) $$
(5)

where D J (•) is a Dirichlet distribution and G(•) is a gamma distribution, J, α and β are the hyperparameters. The posterior distribution is (Venturini et al. 2008)

$$ p\left({\pi}_1,\dots, {\pi}_J,\theta |{x}_1,\dots, {x}_n,{z}_1,\dots, {z}_n\right)\propto \left(\prod_{j=1}^J{\pi}_j^{\left(1/ J\right)+{n}_j-1}\right){\theta}^{\alpha +\left(\sum_{i=1}^n{z}_i\right)-1}{e}^{-\left(\beta +\sum_{i=1}^n{x}_i\right)\theta} $$
(6)

where

$$ {n}_j=\sum_{i=1}^n\mathrm{I}\left({z}_i= j\right) $$

as well as j = 1, ..., J and Ι(•) is the indicator function.

The posterior distribution is estimated using a Gibbs sampler, the parameter θ is derived analytically through integration. After having integrated out θ the posterior distribution is (Venturini et al. 2008)

$$ p\left({\pi}_1,\dots, {\pi}_J,\theta |{x}_1,\dots, {x}_n,{z}_1,\dots, {z}_n\right)\propto \prod_{j=1}^J{\pi}_j^{\left(1/ J\right)+{n}_j-1} $$
(7)

The primary advantage of this strategy is that the Markov chain runs in a smaller space (Robert 1996; MacEachern et al. 1999; Venturini et al. 2008).

3 Materials and methods

3.1 Field measurements

The plots were sampled in two-generation stands with fir Abies alba Mill. and beech Fagus sylvatica L., in protected, near-natural forests in the Świętokrzyskie Mountains (Świętokrzyski National Park, 50° 50′–50° 53′ N, 21° 01′–21° 05′ E). The study area lies at an elevation between 320 and 590 m above sea level. The most common plant associations are Dentario glandulosae-Fagetum and Abietetum polonicum (nomenclature after Matuszkiewicz 2008). In these stands, 30 circular plots from 0.2 to 0.4 ha were randomly selected. The radius of each plot was chosen so that the whole plot was situated within the boundaries of a homogenous patch of similar vertical stand structure. The age of trees, determined on the basis of increment core analysis, carried out during the present study and earlier dendrochronological research, shows that in the investigated area fir and beech trees of the older generation were usually characterised by DBHs >70 cm (Podlaski 2008, 2011a, b; Podlaski and Żelezik 2012). In each plot, the DBH was measured for all living trees >6.9 cm in diameter.

3.2 Forest data

To identify similar DBH structures in the investigated plots, 21 were used variables: fractions of the tree number (10 variables) and fractions of the basal area (10 variables) at 10-cm intervals from 7 to 107 cm, and the number of main extremes for DBH distributions (1 variable). The hierarchical cluster analysis (HCA) was employed with the Jaccard measure and the Ward’s minimum variance agglomeration method. The 20 plots were clustered in three main groups (Fig. 1):

  1. 1.

    Group RS includes DBH distributions showing the rotated-sigmoid (RS) shape (10 plots) (Fig. 2).

  2. 2.

    Group BMS includes DBH distributions showing the typical bimodal M-shape (5 plots) (Fig. 3).

  3. 3.

    Group UID includes the unimodal irregularly descending distributions (5 plots) (Fig. 4).

Fig. 1
figure 1

Correspondence analysis (CA) ordination diagrams (CA1 and CA2 are ordination axes); 21 variables were used in the analysis to describe empirical tree DBH distributions. aEllipsediagram—the weighted correlation defines the direction of the principal axis of the ellipse. bSpiderdiagram—each point is connected to the group centroid (large black circles). Cluster RS—rotated-sigmoid DBH distributions, cluster BMS— typical bimodal M-shape DBH distributions, cluster UID—unimodal, irregularly descending DBH distributions

Fig. 2
figure 2

Approximation of the empirical DBH distribution of an example stand from the group RS using the kernel density estimator and the GSM model (plot No. RS07)

Fig. 3
figure 3

Approximation of the empirical DBH distribution of an example stand from the group BMS using the kernel density estimator and the GSM model (plot No. BMS03)

Fig. 4
figure 4

Approximation of the empirical DBH distribution of an example stand from the group UID using the kernel density estimator and the GSM model (plot No. UID03)

The remaining 10 plots, in which the share of fir and beech assessed on the basis of a tree number was smaller than 80% as well as DBH distributions forming transitional structures, were not used in further studies.

In the investigated plots basal area for all species together was from 10.78 to 63.09 m2 ha−1. The number of trees ranged from 86 to 234 stems per plot. Fir and beech definitely dominated and the appropriate values of the basal area varied from 6.62 to 53.5 m2 ha−1 for fir and from 0.10 to 25.58 m2 ha−1 for beech.

3.3 Data analysis

Fitting with the GSM model requires three hyperparameters: the number of components J, and the α and β from the conjugate prior on θ. During the approximation of the empirical DBH data using the GSM model, it was assumed that the value of J = 250 and the weight of the prior information ω = 0.35 (ω values between 0.2 and 0.5 are usually choices; for detailed information, see Venturini et al. 2008). With these assumptions for each plot, the α and β values were calculated as follows (Venturini et al. 2008):

$$ \beta =\frac{\omega\;\sum_{i=1}^n{x}_i}{1-\omega} $$
(8)
$$ \alpha =\frac{J}{ \max \left({x}_1,\dots, {x}_n\right)}\;\beta $$
(9)

Kernel-type estimators are commonly used as non-parametric estimators for density functions. Let x 1, ..., x n be sample DBHs from an unknown density f. Then, its kernel estimate \( \widehat{f} \) is

$$ \widehat{f}\left( x| h\right)=\frac{1}{ n h}\sum_{k=1}^n K\left(\frac{x-{x}_i}{h}\right) $$
(10)

where K(•) is a kernel function and h is a bandwidth. In this study, a Gaussian density as the kernel and a bandwidth h = 2 cm were used; the width for the DBH classes was chosen to be 2 cm (see also Lopez-de-Ullibarri 2015).

Two statistics were proposed for comparing the precision of the approximation of empirical DBH data using the GSM model and the kernel density estimation:

$$ {B}_{\mathrm{DIF}}=\left|{B}_{\mathrm{GSM}}\right|-\left|{B}_{\ker}\right| $$
(11)
$$ {A}_{\mathrm{DIF}}={A}_{\mathrm{GSM}}-{A}_{\ker } $$
(12)

with

$$ {B}_{\bullet }=\frac{1}{l}\sum_{q=1}^l\left({n}_q-{\widehat{n}}_q\right) $$
(13)
$$ {A}_{\bullet }=\frac{1}{l}\sum_{q=1}^l\left|{n}_q-{\widehat{n}}_q\right| $$
(14)

where n q and \( {\widehat{n}}_q \) are the observed and predicted numbers of trees for the GSM model (B  ≡ B GSM and A  ≡ A GSM) or for the kernel density estimation (B  ≡ B ker and A  ≡ A ker), respectively, in the qth DBH class in the investigated plot; l is the number of DBH classes. The values of the B and A statistics indicate a measure of the bias and the flexibility of the analysed models, respectively.

3.4 Simulation studies

In order to generate random DBH data from the GSM model, the procedure using a multimodal distribution and gamma random numbers (hereinafter the MDGR procedure) and MCMC techniques with Metropolis–Hastings sampling (hereinafter the MH method) were employed. Gamma random numbers were generated in multinomial distribution cells using the acceptance-rejection principle with proper choice of the majorisation function (when the shape parameter was less than 1) or as the sum of two independent gamma variates (when the shape parameter was greater than or equal to 1) (Ahrens and Dieter methods; for detailed information, see Ahrens and Dieter 1974, 1982). The standard Metropolis–Hastings algorithm with jumping normal distribution was used (Robert and Casella 2004). For each plot, the following scheme was employed:

  1. 1.

    The empirical DBH distribution was fitted with the GSM model.

  2. 2.

    50 samples of 100, 250 and 500 DBHs each were drawn using the GSM model and the MDGR procedure.

  3. 3.

    50 samples of 100, 250 and 500 DBHs each were drawn using the GSM model and the MH method.

The k-sample Anderson-Darling tests (Scholz and Stephens 1987) were used to test the null hypotheses that (1) the samples come from the same but unspecified continuous distribution function and (2) the samples drawn using the MDGR procedure (block 1) and the MH method (block 2) come from the same but unspecified continuous distribution function (this function may change from block to block).

In the first case, the analyses were conducted for 50 samples containing 100, 250 and 500 DBHs for the MDGR procedure and the MH method; in total, six null hypotheses were tested for each plot. The Anderson-Darling k-sample test was employed; if AD is the Anderson-Darling criterion for k samples, its standardised test statistic is (Scholz and Zhu 2016)

$$ T. AD=\frac{AD-\mu}{\sigma} $$
(15)

with μ and σ representing the mean and standard deviation of AD.

In the second case, the analyses were conducted for 50 samples containing 100, 250 and 500 DBHs; in total, three null hypotheses were tested for each plot; the combined Anderson-Darling k-sample test was employed. This multiple procedure combines several independent k-sample Anderson-Darling tests into one overall test. If AD i is the Anderson-Darling criterion for the ith block of k i samples, its standardised test statistic is (Scholz and Zhu 2016)

$$ T.{AD}_i=\frac{AD_i-{\mu}_i}{\sigma_i} $$
(16)

with μ i and σ i representing the mean and standard deviation of AD i . The combined Anderson-Darling criterion is (Scholz and Zhu 2016)

$$ {AD}_{\mathrm{comb}}=\sum_{i=1}^M{AD}_i $$
(17)

and

$$ T.{AD}_{\mathrm{comb}}=\frac{AD_{\mathrm{comb}}-{\mu}_c}{\sigma_c} $$
(18)

where

$$ {\mu}_c=\sum_{i=1}^M{\mu}_i $$
(19)
$$ {\sigma}_c=\sqrt{\sum_{i=1}^M{\sigma}_i^2} $$
(20)

and M is the number of blocks (M = 2).

These statistical analyses enabled the assessment of the level of homogeneity within drawn samples (Jamshidian and Jalal 2010). The k-sample Anderson-Darling tests do not require the user to assume that each analysed group belongs to a normal population and has the same variance. In all the cases, the first version of the Anderson-Darling test statistic was computed (for detailed information, see Scholz and Stephens 1987).

For each generated set of 50 samples, the fraction of samples with DBHs >70 cm was calculated. These fractions allowed assessment of the suitability of the two investigated methods in generating random DBH data from the GSM model; the main assessment criterion was the occurrence of trees of an older generation (characterised by DBH >70 cm).

Computational procedures were implemented using the statistical software R (R Core Team 2015); the GSM and the kSamples packages of R were also used (Venturini 2015; Scholz and Zhu 2016).

4 Results

In all plots, one to three trees representing the older generation (DBH exceeding 70 cm) were present. In three plots, the DBH of the thickest trees reached 100 cm. Trees of a DBH lower than 50 cm represented from 89 to 99% of all the trees in the investigated plots, whereas those with a DBH lower than 25 cm accounted for 46 to 74% of all the trees. The number of trees varied from 215 to 935 N ha−1. The mean skewness for the plots was 1.3276. Generally, investigated DBH distributions are highly skewed and heavy-tailed (Figs. 2, 3 and 4).

The GSM model consists of 250 single gamma functions (J = 250). Each of these functions has a particular mixture weight (π 1, ..., π J ). The sum of all the mixture weights for a given model is equal to 1. The sums of 50-length intervals of mixture weights reflect the approximate distribution of these proportions (Table 1). In the plots, DBH distributions are asymmetrical and that is why mixture weight distributions also have longer-than-normal right tails. The mean sums of the 50-length intervals of mixture weights for the investigated plots varied from 0.478732 to 0.0111045 (from left to right; Table 1).

Table 1 Sums of mixture weights (π 1, ..., π J  ;  J = 250), scale parameter (1/θ) and goodness-of-fit statistics (B DIF, A DIF) for the gamma shape mixture (GSM) model

The B DIF and A DIF statistics compare the bias and the flexibility of the GSM model and the kernel density estimation (negative numbers show that the GSM model is ‘better’). The B DIF values were higher than zero in the case of all the 20 investigated plots (range 0.009–0.295), while the A DIF values were lower than zero for 14 plots and higher than zero for 6 plots (from −0.619 to 0.218) (Table 1). The values of the calculated statistics indicate that the bias was lower for the kernel density estimation, while the GSM model was characterised by greater flexibility.

A desirable method of random variates generation must include various criteria, especially precision. For precise criterion p value parameters based on the Anderson-Darling k-sample test were calculated (Table 2).

  1. 1.

    With the MDGR procedure—from 0.0109 to 0.9253 for samples of 100 DBHs, from 0.0719 to 0.9798 for samples of 250 DBHs and from 0.0172 to 0.8839 for samples of 500 DBHs

  2. 2.

    With the MH method—from 0.0001 to 0.8791 for samples of 100 DBHs, from 0.0001 to 0.7821 for samples of 250 DBHs and from 0.0001 to 0.8897 for samples of 500 DBHs

Table 2 The p values for the Anderson-Darling k-sample test comparing DBH distributions within drawn DBH samples

In terms of precision, the MDGR procedure provides higher p values than the MH method, but the differences are small (Table 2). Therefore, the MDGR procedure is slightly more precise than the MH method. This is especially so in the case of the samples of 250 DBHs. The presented results are confirmed by the combined Anderson-Darling k-sample test (Table 3). The p values were from 0.0001 to 0.9036 for samples of 100 DBHs, from 0.0022 to 0.9964 for samples of 250 DBHs and from 0.0001 to 0.8838 for samples of 500 DBHs (Table 3). The high p values show that the level of the homogeneity within drawn DBH sets was similar for all generated samples and for all the three groups of DBH distributions (RS, BMS and UID; Table 3).

Table 3 The p values for the combined Anderson-Darling k-sample test comparing DBH distributions within drawn DBH samples grouped in two blocks; DBHs drawn using the MDGR procedure were grouped in the first block and DBHs drawn using the MH method were grouped in the second block

The greatest fractions for generated samples containing DBHs >70 cm were achieved for the MDGR procedure in the case of simulations of 500 DBHs in a sample (maximal fraction was equal 1.00 for ten plots; Table 4). The smallest fractions were obtained for the MH method in the case of simulations of 100 DBHs in a sample (maximal fraction was equal 0.96 for one plot; Table 4).

Table 4 Fractions of the DBHs >70 cm, assessing the frequency of trees representing the older generation in the sample; in all the investigated plots, the older generation was composed of trees with DBHs above 70 cm

The simulations that were carried out showed that both of the investigated methods are capable of simulating the DBH data from the GSM model, but the MDGR procedure was slightly more effective than the MH method.

5 Discussion

For generating the DBH data sets from the GSM model, one can use the MDGR procedure and, to a lesser degree, the MH method, preferably to simulate large sets containing e.g. 500 DBHs. In the case of smaller sets, it is always necessary to check if within the generated data there are DBHs representing trees from an older generation. A similar procedure can be used in all stands, in which one of the tree generations is represented but by few trees.

This paper has compared two methods for generating random DBH data from the GSM model fed with real data from forests with fir and beech in one geographical region. Future research can be concerned with forests consisting of different species and growing in different regions.

The GSM model is very flexible and thus it allows precise approximation of irregular data sets with local extremes. Increasing the value of J, we can increase the precision of the approximation but this may cause numerical problems. If we want to include empirical irregularity in the GSM models, then we should increase the value of J but if multimodality is random, then we should decrease the value of J. In the case of existence of specific subpopulations, it is desirable to use mixture models, in which component densities represent these subpopulations (Podlaski and Roesch 2014). However, it is necessary to remember that mixture models are not very useful where there is a significant difference in the number of elements constituting the subpopulations, as exemplified by highly skewed and heavy-tailed distributions in which one of the subpopulations forms the distribution tail. The very small number of elements of this subpopulation usually makes impossible to associate the component of the mixture model with the subpopulation.

Fir and beech trees from the older generation usually create only small local DBH maxima within a lower threshold of over 70 cm. This kind of highly skewed and heavy-tailed distribution is correctly approximated by the GSM model. The precision of the GSM model was comparable to the approximation precision obtained with the use of the kernel density estimation. This is a very interesting result because the kernel density estimation is characterised by high flexibility (e.g. Buch-Larsen et al. 2005; Podlaski and Roesch 2014).

The problem of highly skewed and heavy-tailed distributions can be circumvented by data transformations. Procedures of this kind are used, among others, in the analysis of variance and in the regression models (Box-Cox, etc.). However, there are some possible drawbacks of these methods (Garay et al. 2016): (1) transformations reduce information on the underlying data generation scheme, (2) parameters may lose interpretability on a transformed scale and (3) transformations are usually not universal and often vary with the data set. Hence, in the case of modelling the highly skewed and heavy-tailed DBH distributions, it is necessary to seek flexible theoretical models.

6 Conclusions

This study has revealed that the GSM model is flexible and accurate when modelling the highly skewed and heavy-tailed DBH distributions of two-generation stands. The GSM model precisely separates older and younger tree generations; it is useful in smoothing the small local DBH maxima. A simulation study has shown that the MDGR procedure was slightly more precise than the MH method. The DBH random variates, generated with the use of these methods from the GSM model, represented all tree generations that are significant from a biological point of view. The high structural diversity of patches of natural, near-natural and managed forests, especially with shade-tolerant species, should stimulate further research related to the analysis of empirical DBH distributions in the context of the GSM model.