Forest modelling: the gamma shape mixture model and simulation of tree diameter distributions

Podlaski, Rafał

doi:10.1007/s13595-017-0629-y

Forest modelling: the gamma shape mixture model and simulation of tree diameter distributions

Original Paper
Open access
Published: 03 April 2017

Volume 74, article number 29, (2017)
Cite this article

Download PDF

You have full access to this open access article

Annals of Forest Science Aims and scope Submit manuscript

Forest modelling: the gamma shape mixture model and simulation of tree diameter distributions

Download PDF

Rafał Podlaski ORCID: orcid.org/0000-0002-6957-3902¹

2686 Accesses
22 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

• Key message

New types of distribution functions are needed to model the dynamics of stands where important age classes are represented by few trees. In this study the gamma shape mixture model and two simulation methods were used for generating tree diameter data.

• Context

To analyse forest dynamics, it is necessary to know distribution of the characteristics (mainly tree diameters) of trees forming particular developmental phases. In many forest inventories, the measurement of large diameter at breast height (DBH) samples is practically impossible. In this case, DBH distributions can be generated using theoretical models.

• Aims

The aim of this study was to assess the precision of the approximation of empirical DBH data using the gamma shape mixture (GSM) model and kernel density estimation. The strengths and weaknesses of the two simulation methods were presented and discussed.

• Methods

The GSM model was adopted to approximate empirical DBH data collected in 20 near-natural stands. Two simulation methods were used: (a) the procedure based on a multimodal distribution and gamma random numbers (MDGR procedure) and (b) MCMC techniques with Metropolis–Hastings sampling (MH method).

• Results

The GSM model precisely fitted the investigated DBH distributions. The MDGR procedure was slightly more precise than the MH method, especially in the case of the samples of 250 DBHs. The level of homogeneity within the drawn DBH sets was similar for all samples.

• Conclusion

The GSM model is very flexible. The DBH random variates, generated with the use of analysed procedures, represented all tree generations being significant from a biological point of view.

A Gaussian multi-component model for the tree diameter distribution in old-growth forests

Article 26 February 2018

A new paradigm in modelling the evolution of a stand via the distribution of tree sizes

Article Open access 20 November 2017

The Chapman-Richards Distribution and its Relationship to the Generalized Beta

Article Open access 21 May 2019

1 Introduction

Disturbances occurring in forest ecosystems are one of the most important determinants of spatio-temporal development in stands (Gratzer et al. 2004). Due to disturbances of different spatial scales, gaps of varying sizes are formed. These processes have a significant effect on the structure of forests. The specific vertical structure is closely related to the shape of the diameter at breast height (DBH) distribution (Lawton and Putz 1988; Denslow et al. 1998). Many tree stands in various geographic regions contain cohorts of old trees, which are represented by only few individuals, but play a great role in stand structure and in ecosystem functioning. It is difficult to find distribution functions to represent these few large trees. During the approximation of these highly skewed and heavy-tailed DBH distributions, there is often the smoothing problem, which in turn requires the use of methods that are able to fit a tail probability well.

Different models have arisen naturally across a range of problems when modelling DBH in forestry (e.g. Pretzsch 2010). Single flexible theoretical distributions (e.g. Weibull, gamma) have often been used to fit empirical DBH data more or less asymmetrically with a positive skewness (Merganič and Sterba 2006; Gove et al. 2008). Mixture distributions with a few components are an appropriate tool for modelling bi- and multimodal empirical DBH distributions (Zhang et al. 2001; Zasada and Cieszewski 2005; Podlaski 2011a, b; Zasada 2013).

In order to model the dynamics of forest stands with cohorts of old trees, new types of distribution functions are needed. A new approach for density estimation of highly skewed and heavy-tailed distributions, the gamma shape mixture (GSM) model, employs a mixture of gamma density functions with unknown weights (Venturini et al. 2008). A general Bayesian approach allows the creation of a flexible model characterised by a single parameter for all the gamma components and the ordinary set of mixture weights (Jasra et al. 2005; Venturini et al. 2008). This method significantly improves predictive performance in estimating tail probabilities compared to standard approaches employing e.g. single flexible theoretical distributions and mixture distributions with a few components (Venturini et al. 2008). A particularly important advantage of the GSM model is the possibility to use a great number of mixture components. In the case of two-generation stands where the two generations significantly differ in the number of trees, the model makes possible, among other things, to generate random DBH data, taking into account the existence of small local DBH maxima. These maxima, representing the older generation and creating longer-than-normal right tails, cannot be treated as atypical observations. The data are indispensable to correctly present DBH distributions in the case of two-generation stands, in which the older generation is formed by single old trees. Thereby, proposals that overcome the problem of atypical observations in distributions (e.g. by their identification and next, elimination) cannot be used.

In ecology, for analysis of forest dynamics, based on simulation studies, one should use data sets (mainly DBH) characterising the investigated stand in particular developmental phases. Tree lists, minimally a set of DBHs with an indicator of tree species, obtained from measurements made in selected plots are used to define the initial condition. The measurement of large DBH samples is practically impossible in many forest inventories due to economic limitations (e.g. Roesch et al. 2015). In this case, the DBH distributions can be generated using theoretical functions (e.g. Thompson 2000, Gehringer and Turnblom 2014).

Forest growth models based on progressing distributions are characterised by the inclusion of stand heterogeneity in the simulation approach, providing information on tree dimensions (e.g. Porté and Bartelink 2002). The accuracy of such models is primarily determined by the flexibility of the underlying type of theoretical function (e.g. Pretzsch 2010). Stand development is presented as a periodic progression of the frequency distributions. Each developmental phase is represented by a theoretical function of specified parameters. By changing these parameters, the BDH distribution can be shifted along the time axis. The DBH data generation makes it possible to increase the number of DBHs for small samples and then allows comparison of the model outputs with independent data.

Procedures based on Markov chain Monte Carlo (MCMC) techniques are frequently used methods for generating random numbers from probability distributions (Liu 2001). The Monte Carlo methods have become one of the most important tools to sample from complex distributions (e.g. Liu 2001; Robert and Casella 2004). There have been several classes of Monte Carlo techniques, e.g. MCMC techniques with Metropolis–Hastings sampling, sequential Monte Carlo techniques that include for example sequential importance resampling or particle filtering (Kong et al. 1994) and recent development of methods with equi-energy sampling (Kou et al. 2006).

The aims of this study are (1) to compare the precision of the approximation of empirical DBH data employing the GSM model and kernel density estimation (parametric and non-parametric methods) and (2) to assess the suitability of two methods for generating random DBH data from the GSM model: (a) the procedure using a multimodal distribution and gamma random numbers and (b) MCMC techniques with Metropolis–Hastings sampling. The GSM model has not been previously used for the analysis of forest data.

2 The gamma shape mixture model

The GSM model is defined as follows (Lehmann and Casella 1998; Venturini et al. 2008):

$$ f\left( x|{\pi}_1,\dots, {\pi}_J,\theta \right)=\sum_{j=1}^J{\pi}_j{f}_j\left( x|\theta \right) $$

(1)

where J is the number of mixture components (known and fixed), π ₁, ..., π _J are mixture weights (proportions) (unknown) and 1/θ is the scale parameter for the whole GSM model (unknown). The gamma distribution f _j(x| θ) has a probability density function (PDF) given by

$$ {f}_j\left( x|\theta \right)=\frac{\theta^j}{\Gamma (j)}{x}^{j-1}{e}^{-\theta\;x} $$

(2)

Each gamma distribution in the GSM model is indexed by a component-specific shape parameter (j) and has a single scale parameter (1/θ).

The GSM model could also be defined as follows (Venturini et al. 2008):

$$ p\left({x}_1,\dots, {x}_n|{z}_1,\dots, {z}_n,\theta \right)=\frac{\theta^{\sum_{i=1}^n{z}_i}}{\prod_{i=1}^n\Gamma \left({z}_i\right)}\left(\prod_{i=1}^n{x}_i^{z_i-1}\right){e}^{-\theta \sum_{i=1}^n{x}_i} $$

(3)

where z ₁, ..., z _n are the missing elements of the sample (Dempster et al. 1977; Diebolt and Robert 1994). Given x ₁, ..., x _n, an integer z _i between 1 and J could be associated to each x _i that identifies the component of the mixture generating observation x _i; this auxiliary variable z _i identifies to which component the observation x _i belongs.

A general Bayesian approach for estimating the unknown parameters of the GSM model is often used. The π ₁, ..., π _J and θ are independent a priori and the following conjugate prior distributions are specified (Venturini et al. 2008):

$$ {\pi}_1,\ldots,{\pi}_J\sim {D}_J\left(\frac{1}{J},\dots, \frac{1}{J}\right) $$

(4)

and

$$ \theta \sim G\left(\alpha, \beta \right) $$

(5)

where D _J(•) is a Dirichlet distribution and G(•) is a gamma distribution, J, α and β are the hyperparameters. The posterior distribution is (Venturini et al. 2008)

$$ p\left({\pi}_1,\dots, {\pi}_J,\theta |{x}_1,\dots, {x}_n,{z}_1,\dots, {z}_n\right)\propto \left(\prod_{j=1}^J{\pi}_j^{\left(1/ J\right)+{n}_j-1}\right){\theta}^{\alpha +\left(\sum_{i=1}^n{z}_i\right)-1}{e}^{-\left(\beta +\sum_{i=1}^n{x}_i\right)\theta} $$

(6)

where

$$ {n}_j=\sum_{i=1}^n\mathrm{I}\left({z}_i= j\right) $$

as well as j = 1, ..., J and Ι(•) is the indicator function.

The posterior distribution is estimated using a Gibbs sampler, the parameter θ is derived analytically through integration. After having integrated out θ the posterior distribution is (Venturini et al. 2008)

$$ p\left({\pi}_1,\dots, {\pi}_J,\theta |{x}_1,\dots, {x}_n,{z}_1,\dots, {z}_n\right)\propto \prod_{j=1}^J{\pi}_j^{\left(1/ J\right)+{n}_j-1} $$

(7)

The primary advantage of this strategy is that the Markov chain runs in a smaller space (Robert 1996; MacEachern et al. 1999; Venturini et al. 2008).

3 Materials and methods

3.1 Field measurements

The plots were sampled in two-generation stands with fir Abies alba Mill. and beech Fagus sylvatica L., in protected, near-natural forests in the Świętokrzyskie Mountains (Świętokrzyski National Park, 50° 50′–50° 53′ N, 21° 01′–21° 05′ E). The study area lies at an elevation between 320 and 590 m above sea level. The most common plant associations are Dentario glandulosae-Fagetum and Abietetum polonicum (nomenclature after Matuszkiewicz 2008). In these stands, 30 circular plots from 0.2 to 0.4 ha were randomly selected. The radius of each plot was chosen so that the whole plot was situated within the boundaries of a homogenous patch of similar vertical stand structure. The age of trees, determined on the basis of increment core analysis, carried out during the present study and earlier dendrochronological research, shows that in the investigated area fir and beech trees of the older generation were usually characterised by DBHs >70 cm (Podlaski 2008, 2011a, b; Podlaski and Żelezik 2012). In each plot, the DBH was measured for all living trees >6.9 cm in diameter.

3.2 Forest data

To identify similar DBH structures in the investigated plots, 21 were used variables: fractions of the tree number (10 variables) and fractions of the basal area (10 variables) at 10-cm intervals from 7 to 107 cm, and the number of main extremes for DBH distributions (1 variable). The hierarchical cluster analysis (HCA) was employed with the Jaccard measure and the Ward’s minimum variance agglomeration method. The 20 plots were clustered in three main groups (Fig. 1):

1.
Group RS includes DBH distributions showing the rotated-sigmoid (RS) shape (10 plots) (Fig. 2).
2.
Group BMS includes DBH distributions showing the typical bimodal M-shape (5 plots) (Fig. 3).
3.
Group UID includes the unimodal irregularly descending distributions (5 plots) (Fig. 4).

The remaining 10 plots, in which the share of fir and beech assessed on the basis of a tree number was smaller than 80% as well as DBH distributions forming transitional structures, were not used in further studies.

In the investigated plots basal area for all species together was from 10.78 to 63.09 m² ha⁻¹. The number of trees ranged from 86 to 234 stems per plot. Fir and beech definitely dominated and the appropriate values of the basal area varied from 6.62 to 53.5 m² ha⁻¹ for fir and from 0.10 to 25.58 m² ha⁻¹ for beech.

3.3 Data analysis

Fitting with the GSM model requires three hyperparameters: the number of components J, and the α and β from the conjugate prior on θ. During the approximation of the empirical DBH data using the GSM model, it was assumed that the value of J = 250 and the weight of the prior information ω = 0.35 (ω values between 0.2 and 0.5 are usually choices; for detailed information, see Venturini et al. 2008). With these assumptions for each plot, the α and β values were calculated as follows (Venturini et al. 2008):

$$ \beta =\frac{\omega\;\sum_{i=1}^n{x}_i}{1-\omega} $$

(8)

$$ \alpha =\frac{J}{ \max \left({x}_1,\dots, {x}_n\right)}\;\beta $$

(9)

Kernel-type estimators are commonly used as non-parametric estimators for density functions. Let x ₁, ..., x _n be sample DBHs from an unknown density f. Then, its kernel estimate $ \widehat{f} $ is

$$ \widehat{f}\left( x| h\right)=\frac{1}{ n h}\sum_{k=1}^n K\left(\frac{x-{x}_i}{h}\right) $$

(10)

where K(•) is a kernel function and h is a bandwidth. In this study, a Gaussian density as the kernel and a bandwidth h = 2 cm were used; the width for the DBH classes was chosen to be 2 cm (see also Lopez-de-Ullibarri 2015).

Two statistics were proposed for comparing the precision of the approximation of empirical DBH data using the GSM model and the kernel density estimation:

$$ {B}_{\mathrm{DIF}}=\left|{B}_{\mathrm{GSM}}\right|-\left|{B}_{\ker}\right| $$

(11)

$$ {A}_{\mathrm{DIF}}={A}_{\mathrm{GSM}}-{A}_{\ker } $$

(12)

with

$$ {B}_{\bullet }=\frac{1}{l}\sum_{q=1}^l\left({n}_q-{\widehat{n}}_q\right) $$

(13)

$$ {A}_{\bullet }=\frac{1}{l}\sum_{q=1}^l\left|{n}_q-{\widehat{n}}_q\right| $$

(14)

where n _q and $ {\widehat{n}}_q $ are the observed and predicted numbers of trees for the GSM model (B _• ≡ B _GSM and A _• ≡ A _GSM) or for the kernel density estimation (B _• ≡ B _ker and A _• ≡ A _ker), respectively, in the qth DBH class in the investigated plot; l is the number of DBH classes. The values of the B _• and A _• statistics indicate a measure of the bias and the flexibility of the analysed models, respectively.

3.4 Simulation studies

In order to generate random DBH data from the GSM model, the procedure using a multimodal distribution and gamma random numbers (hereinafter the MDGR procedure) and MCMC techniques with Metropolis–Hastings sampling (hereinafter the MH method) were employed. Gamma random numbers were generated in multinomial distribution cells using the acceptance-rejection principle with proper choice of the majorisation function (when the shape parameter was less than 1) or as the sum of two independent gamma variates (when the shape parameter was greater than or equal to 1) (Ahrens and Dieter methods; for detailed information, see Ahrens and Dieter 1974, 1982). The standard Metropolis–Hastings algorithm with jumping normal distribution was used (Robert and Casella 2004). For each plot, the following scheme was employed:

1.
The empirical DBH distribution was fitted with the GSM model.
2.
50 samples of 100, 250 and 500 DBHs each were drawn using the GSM model and the MDGR procedure.
3.
50 samples of 100, 250 and 500 DBHs each were drawn using the GSM model and the MH method.

The k-sample Anderson-Darling tests (Scholz and Stephens 1987) were used to test the null hypotheses that (1) the samples come from the same but unspecified continuous distribution function and (2) the samples drawn using the MDGR procedure (block 1) and the MH method (block 2) come from the same but unspecified continuous distribution function (this function may change from block to block).

In the first case, the analyses were conducted for 50 samples containing 100, 250 and 500 DBHs for the MDGR procedure and the MH method; in total, six null hypotheses were tested for each plot. The Anderson-Darling k-sample test was employed; if AD is the Anderson-Darling criterion for k samples, its standardised test statistic is (Scholz and Zhu 2016)

$$ T. AD=\frac{AD-\mu}{\sigma} $$

(15)

with μ and σ representing the mean and standard deviation of AD.

In the second case, the analyses were conducted for 50 samples containing 100, 250 and 500 DBHs; in total, three null hypotheses were tested for each plot; the combined Anderson-Darling k-sample test was employed. This multiple procedure combines several independent k-sample Anderson-Darling tests into one overall test. If AD _i is the Anderson-Darling criterion for the ith block of k _i samples, its standardised test statistic is (Scholz and Zhu 2016)

$$ T.{AD}_i=\frac{AD_i-{\mu}_i}{\sigma_i} $$

(16)

with μ _i and σ _i representing the mean and standard deviation of AD _i. The combined Anderson-Darling criterion is (Scholz and Zhu 2016)

$$ {AD}_{\mathrm{comb}}=\sum_{i=1}^M{AD}_i $$

(17)

and

$$ T.{AD}_{\mathrm{comb}}=\frac{AD_{\mathrm{comb}}-{\mu}_c}{\sigma_c} $$

(18)

where

$$ {\mu}_c=\sum_{i=1}^M{\mu}_i $$

(19)

$$ {\sigma}_c=\sqrt{\sum_{i=1}^M{\sigma}_i^2} $$

(20)

and M is the number of blocks (M = 2).

These statistical analyses enabled the assessment of the level of homogeneity within drawn samples (Jamshidian and Jalal 2010). The k-sample Anderson-Darling tests do not require the user to assume that each analysed group belongs to a normal population and has the same variance. In all the cases, the first version of the Anderson-Darling test statistic was computed (for detailed information, see Scholz and Stephens 1987).

For each generated set of 50 samples, the fraction of samples with DBHs >70 cm was calculated. These fractions allowed assessment of the suitability of the two investigated methods in generating random DBH data from the GSM model; the main assessment criterion was the occurrence of trees of an older generation (characterised by DBH >70 cm).

Computational procedures were implemented using the statistical software R (R Core Team 2015); the GSM and the kSamples packages of R were also used (Venturini 2015; Scholz and Zhu 2016).

4 Results

In all plots, one to three trees representing the older generation (DBH exceeding 70 cm) were present. In three plots, the DBH of the thickest trees reached 100 cm. Trees of a DBH lower than 50 cm represented from 89 to 99% of all the trees in the investigated plots, whereas those with a DBH lower than 25 cm accounted for 46 to 74% of all the trees. The number of trees varied from 215 to 935 N ha⁻¹. The mean skewness for the plots was 1.3276. Generally, investigated DBH distributions are highly skewed and heavy-tailed (Figs. 2, 3 and 4).

The GSM model consists of 250 single gamma functions (J = 250). Each of these functions has a particular mixture weight (π ₁, ..., π _J). The sum of all the mixture weights for a given model is equal to 1. The sums of 50-length intervals of mixture weights reflect the approximate distribution of these proportions (Table 1). In the plots, DBH distributions are asymmetrical and that is why mixture weight distributions also have longer-than-normal right tails. The mean sums of the 50-length intervals of mixture weights for the investigated plots varied from 0.478732 to 0.0111045 (from left to right; Table 1).

Table 1 Sums of mixture weights (π ₁, ..., π _J ; J = 250), scale parameter (1/θ) and goodness-of-fit statistics (B _DIF, A _DIF) for the gamma shape mixture (GSM) model

Full size table

The B _DIF and A _DIF statistics compare the bias and the flexibility of the GSM model and the kernel density estimation (negative numbers show that the GSM model is ‘better’). The B _DIF values were higher than zero in the case of all the 20 investigated plots (range 0.009–0.295), while the A _DIF values were lower than zero for 14 plots and higher than zero for 6 plots (from −0.619 to 0.218) (Table 1). The values of the calculated statistics indicate that the bias was lower for the kernel density estimation, while the GSM model was characterised by greater flexibility.

A desirable method of random variates generation must include various criteria, especially precision. For precise criterion p value parameters based on the Anderson-Darling k-sample test were calculated (Table 2).

1.
With the MDGR procedure—from 0.0109 to 0.9253 for samples of 100 DBHs, from 0.0719 to 0.9798 for samples of 250 DBHs and from 0.0172 to 0.8839 for samples of 500 DBHs
2.
With the MH method—from 0.0001 to 0.8791 for samples of 100 DBHs, from 0.0001 to 0.7821 for samples of 250 DBHs and from 0.0001 to 0.8897 for samples of 500 DBHs

Table 2 The p values for the Anderson-Darling k-sample test comparing DBH distributions within drawn DBH samples

Full size table

In terms of precision, the MDGR procedure provides higher p values than the MH method, but the differences are small (Table 2). Therefore, the MDGR procedure is slightly more precise than the MH method. This is especially so in the case of the samples of 250 DBHs. The presented results are confirmed by the combined Anderson-Darling k-sample test (Table 3). The p values were from 0.0001 to 0.9036 for samples of 100 DBHs, from 0.0022 to 0.9964 for samples of 250 DBHs and from 0.0001 to 0.8838 for samples of 500 DBHs (Table 3). The high p values show that the level of the homogeneity within drawn DBH sets was similar for all generated samples and for all the three groups of DBH distributions (RS, BMS and UID; Table 3).

Table 3 The p values for the combined Anderson-Darling k-sample test comparing DBH distributions within drawn DBH samples grouped in two blocks; DBHs drawn using the MDGR procedure were grouped in the first block and DBHs drawn using the MH method were grouped in the second block

Full size table

The greatest fractions for generated samples containing DBHs >70 cm were achieved for the MDGR procedure in the case of simulations of 500 DBHs in a sample (maximal fraction was equal 1.00 for ten plots; Table 4). The smallest fractions were obtained for the MH method in the case of simulations of 100 DBHs in a sample (maximal fraction was equal 0.96 for one plot; Table 4).

Table 4 Fractions of the DBHs >70 cm, assessing the frequency of trees representing the older generation in the sample; in all the investigated plots, the older generation was composed of trees with DBHs above 70 cm

Full size table

The simulations that were carried out showed that both of the investigated methods are capable of simulating the DBH data from the GSM model, but the MDGR procedure was slightly more effective than the MH method.

5 Discussion

For generating the DBH data sets from the GSM model, one can use the MDGR procedure and, to a lesser degree, the MH method, preferably to simulate large sets containing e.g. 500 DBHs. In the case of smaller sets, it is always necessary to check if within the generated data there are DBHs representing trees from an older generation. A similar procedure can be used in all stands, in which one of the tree generations is represented but by few trees.

This paper has compared two methods for generating random DBH data from the GSM model fed with real data from forests with fir and beech in one geographical region. Future research can be concerned with forests consisting of different species and growing in different regions.

The GSM model is very flexible and thus it allows precise approximation of irregular data sets with local extremes. Increasing the value of J, we can increase the precision of the approximation but this may cause numerical problems. If we want to include empirical irregularity in the GSM models, then we should increase the value of J but if multimodality is random, then we should decrease the value of J. In the case of existence of specific subpopulations, it is desirable to use mixture models, in which component densities represent these subpopulations (Podlaski and Roesch 2014). However, it is necessary to remember that mixture models are not very useful where there is a significant difference in the number of elements constituting the subpopulations, as exemplified by highly skewed and heavy-tailed distributions in which one of the subpopulations forms the distribution tail. The very small number of elements of this subpopulation usually makes impossible to associate the component of the mixture model with the subpopulation.

Fir and beech trees from the older generation usually create only small local DBH maxima within a lower threshold of over 70 cm. This kind of highly skewed and heavy-tailed distribution is correctly approximated by the GSM model. The precision of the GSM model was comparable to the approximation precision obtained with the use of the kernel density estimation. This is a very interesting result because the kernel density estimation is characterised by high flexibility (e.g. Buch-Larsen et al. 2005; Podlaski and Roesch 2014).

The problem of highly skewed and heavy-tailed distributions can be circumvented by data transformations. Procedures of this kind are used, among others, in the analysis of variance and in the regression models (Box-Cox, etc.). However, there are some possible drawbacks of these methods (Garay et al. 2016): (1) transformations reduce information on the underlying data generation scheme, (2) parameters may lose interpretability on a transformed scale and (3) transformations are usually not universal and often vary with the data set. Hence, in the case of modelling the highly skewed and heavy-tailed DBH distributions, it is necessary to seek flexible theoretical models.

6 Conclusions

This study has revealed that the GSM model is flexible and accurate when modelling the highly skewed and heavy-tailed DBH distributions of two-generation stands. The GSM model precisely separates older and younger tree generations; it is useful in smoothing the small local DBH maxima. A simulation study has shown that the MDGR procedure was slightly more precise than the MH method. The DBH random variates, generated with the use of these methods from the GSM model, represented all tree generations that are significant from a biological point of view. The high structural diversity of patches of natural, near-natural and managed forests, especially with shade-tolerant species, should stimulate further research related to the analysis of empirical DBH distributions in the context of the GSM model.

References

Ahrens JH, Dieter U (1974) Computer methods for sampling from gamma, beta, Poisson and binomial distributions. Computing 12:223–246
Article Google Scholar
Ahrens JH, Dieter U (1982) Generating gamma variates by a modified rejection technique. Commun ACM 25:47–54
Article Google Scholar
Buch-Larsen T, Nielsen JP, Guillen M, Bolance C (2005) Kernel density estimation for heavy-tailed distribution using the Champernowne transformation. Statistics 39:503–518. doi:10.1080/02331880500439782
Article Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
Google Scholar
Denslow JS, Ellison AM, Sanford RE (1998) Treefall gap size effects on above- and below-ground processes in a tropical wet forest. J Ecol 86:597–609
Article Google Scholar
Diebolt J, Robert CP (1994) Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B 56:363–375
Google Scholar
Garay AM, Lachos VH, Lin TI (2016) Nonlinear censored regression models with heavy-tailed distributions. Stat Interface 9:281–293
Article Google Scholar
Gehringer KR, Turnblom EC (2014) Constructing a virtual forest: using hierarchical nearest neighbor imputation to generate simulated tree lists. Can J For Res 44:711–719. doi:10.1139/cjfr-2014-0020
Article Google Scholar
Gove JH, Ducey MJ, Leak WB, Zhang L (2008) Rotated sigmoid structures in managed uneven-aged northern hardwood stands: a look at the Burr type III distribution. Forestry 81:161–176. doi:10.1093/forestry/cpm025
Article Google Scholar
Gratzer G, Canham CD, Dieckmann U, Fischer A, Iwasa Y, Law R, Lexer MJ, Spies T, Splechtna B, Szwagrzyk J (2004) Spatio-temporal development of forests—current trends in field methods and models. Oikos 107:3–15. doi:10.1111/j.0030-1299.2004.13063.x
Article Google Scholar
Jamshidian M, Jalal S (2010) Tests of homoscedasticity, normality, and missing completely at random for incomplete multivariate data. Psychometrika 75:649–674. doi:10.1007/s11336-010-9175-3
Article PubMed PubMed Central Google Scholar
Jasra A, Holmes CC, Stephens DA (2005) Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat Sci 20:50–67. doi:10.1214/088342305000000016
Article Google Scholar
Kong A, Liu JS, Wong WH (1994) Sequential imputations and Bayesian missing data problems. J Am Stat Assoc 89:278–288
Article Google Scholar
Kou SC, Zhou Q, Wong WH (2006) Equi-energy sampler with applications in statistical inference and statistical mechanics (with discussion). Ann Stat 34:1581–1619. doi:10.1214/009053606000000515
Article Google Scholar
Lawton RO, Putz FE (1988) Natural disturbance and gap-phase regeneration in a wind-exposed tropical cloud forest. Ecology 69:764–777
Article Google Scholar
Lehmann EL, Casella G (1998) Theory of point estimation. Springer, New York
Google Scholar
Liu JS (2001) Monte Carlo strategies in scientific computing. Springer, New York
Google Scholar
Lopez-de-Ullibarri I (2015) Bandwidth selection in kernel distribution function estimation. Stata J 15:784–795
Google Scholar
MacEachern SN, Clyde M, Liu JS (1999) Sequential importance sampling for nonparametric Bayes models: the next generation. Can J Stat 27:251–267. doi:10.2307/3315637
Article Google Scholar
Matuszkiewicz JM (2008) Zespoły leśne Polski. Państwowe Wydawnictwo Naukowe, Warszawa
Google Scholar
Merganič J, Sterba H (2006) Characterisation of diameter distribution using the Weibull function: method of moments. Eur J For Res 125:427–439. doi:10.1007/s10342-006-0138-2
Article Google Scholar
Podlaski R (2008) Dynamics in central European near-natural Abies–Fagus forests: does the mosaic-cycle approach provide an appropriate model? J Veg Sci 19:173–182. doi:10.3170/2008-8-18350
Article Google Scholar
Podlaski R (2011a) Modelowanie rozkładów pierśnic drzew z wykorzystaniem rozkładów mieszanych I. Definicja, charakterystyka i estymacja parametrów rozkładów mieszanych. Sylwan 155(4):244–252
Google Scholar
Podlaski R (2011b) Modelowanie rozkładów pierśnic drzew z wykorzystaniem rozkładów mieszanych II. Aproksymacja rozkładów pierśnic w lasach wielopiętrowych. Sylwan 155(5):293–300
Google Scholar
Podlaski R, Roesch FA (2014) Modelling diameter distributions of two-cohort forest stands with various proportions of dominant species: a two-component mixture model approach. Math Biosci 249:60–74. doi:10.1016/j.mbs.2014.01.007
Article PubMed Google Scholar
Podlaski R, Żelezik M (2012) Ocena kondycji modrzewia Larix decidua Mill. subsp. polonica (Racib.) Domin i innych gatunków drzew na Chełmowej Górze w Świętokrzyskim Parku Narodowym. Sylwan 156(3):170–181
Google Scholar
Porté A, Bartelink HH (2002) Modelling mixed forest growth: a review of models for forest management. Ecol Model 150:141–188. doi:10.1016/s0304-3800(01)00476-8
Article Google Scholar
Pretzsch H (2010) Forest dynamics, growth and yield. From measurement to model. Springer, Berlin
Book Google Scholar
R Development Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna Available from www.R-project.org
Google Scholar
Robert CP (1996) Mixtures of distributions: inference and estimation. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain Monte Carlo in practice. Chapman and Hall/CRC, New York, pp 441–464
Google Scholar
Robert CP, Casella G (2004) Monte Carlo statistical methods. Springer, New York
Book Google Scholar
Roesch FA, Coulston JW, van Deusen PC, Podlaski R (2015) Evaluation of image-assisted forest monitoring: a simulation. Forests 6:2897–2917. doi:10.3390/f6092897
Article Google Scholar
Scholz FW, Stephens MA (1987) K-sample Anderson-Darling tests. J Am Stat Assoc 82:918–924
Google Scholar
Scholz FW, Zhu A (2016) kSamples: k-sample rank tests and their combinations. R package version 1:2–3 http://CRAN.R-project.org/package=kSamples. Accessed 1 March 2016
Google Scholar
Thompson JR (2000) Simulation: a modeler’s approach. John Wiley & Sons, New York
Google Scholar
Venturini S (2015) GSM: gamma shape mixture. R package version 1(3):2 http://CRAN.R-project.org/package=GSM. Accessed 1 March 2016
Google Scholar
Venturini S, Dominici F, Parmigiani G (2008) Gamma shape mixtures for heavy-tailed distributions. Ann Appl Stat 2:756–776. doi:10.1214/07-AOAS156
Article Google Scholar
Zasada M (2013) Evaluation of the double normal distribution for tree diameter distribution modeling. Silv Fenn 47,id 956:17 p. doi: 10.14214/sf.956
Zasada M, Cieszewski CJ (2005) A finite mixture distribution approach for characterizing tree diameter distributions by natural social class in pure even-aged Scots pine stands in Poland. For Ecol Manag 204:145–158. doi:10.1016/j.foreco.2003.12.023
Article Google Scholar
Zhang LJ, Gove JH, Liu C, Leak WB (2001) A finite mixture of two Weibull distributions for modeling the diameter distributions of rotated-sigmoid, uneven-aged stands. Can J For Res 31:1654–1659. doi:10.1139/x01-086
Article Google Scholar

Download references

Acknowledgments

The author wishes to thank the editor and the referees for valuable and pertinent comments.

Author information

Authors and Affiliations

Department of Nature Protection and Plant Physiology, Institute of Biology, Jan Kochanowski University, ul. Świętokrzyska 15, 25-406, Kielce, Poland
Rafał Podlaski

Authors

Rafał Podlaski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafał Podlaski.

Ethics declarations

Funding

This work was supported by the Polish Ministry of Science and Higher Education (public resources dedicated to research in 2010–2013, Grant No. N N309 044138).

Additional information

Handling Editor: Aaron R. Weiskittel

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Podlaski, R. Forest modelling: the gamma shape mixture model and simulation of tree diameter distributions. Annals of Forest Science 74, 29 (2017). https://doi.org/10.1007/s13595-017-0629-y

Download citation

Received: 27 December 2016
Accepted: 10 March 2017
Published: 03 April 2017
DOI: https://doi.org/10.1007/s13595-017-0629-y

Forest modelling: the gamma shape mixture model and simulation of tree diameter distributions