Introduction

Larix gmelinii subsp. principis-rupprechtii (Mayr) A. E. Murray (Pinaceae) is the main afforestation tree species in the mountains of northern China (Fu 2017) because of its faster growth, excellent wood materials, stronger resistance to bad weather and wind, and contributions to soil conservation. It is thus important in mountainous regions. In recent years, however, large-scale wilting has been found in Larix forests in these regions, although the death rate of Larix species differs significantly among regions due to differences in site conditions and environmental factors (Chen and Hua 1991; Ban et al. 1997).

Predicting tree mortality is one of the important parts of forest growth and yield models (Clutter and Jones 1980; Knoebel and Burkhart 1986). Information on tree mortality and potential causes are very important for understanding forest dynamics (Das and Nathan 2015) because mortality may strongly influence future stand status (Bircher et al. 2015). Mortality has a significant influence on the prediction accuracy of changes of global stand structures (Dietze and Jaclyn 2014). Because we lack a comprehensive understanding of tree mortality is incomprehensive, as tree mortality is rarely observed and its cause is unclear (Das and Nathan 2015; Vanoni et al. 2016). As a result, tree mortality is still the most difficult part to incorporate in models to predict forest growth and harvest (Hamilton and Edwards 1976).

Tree mortality results from the combined effect of environmental factors, stand factors and genetic characteristics of tree and thus differs among stands; trees most often gradually decline in vitality until they die. Tree mortality is highly complex, with a multifactor synergism and has considerable degree of the randomness; therefore, the underlying mechanisms are difficult to elucidate (Sala et al. 2010), limiting modeling capability (Galbraith et al. 2010; Adams et al. 2013).

The many factors and their interactions of the factors affecting mortality at the same time in the same place make it difficult to describe variability of the observed mortality data using traditional modelling approaches such as ordinary least square regression. Alternatively, mixed-effects modelling makes it possible to describe mortality more effectively than the traditional modeling approach (Zhang et al. 2014, 2017b). Thus, mixed-effects tree mortality models should be developed to improve predictions of forest damage at stand levels.

Stand level mortality is a count variable because there are often no dead trees in a stand. The least squares method implicitly presumes that the data are Gaussian distributed with constant variances or at least satisfy Gauss–Markov assumptions. If the least squares method is applied to data with a large proportion of zero counts, the estimated results would be biased. Thus, linear models are not appropriate to describe mortality. A generalized linear model is often used to describe the variability of tree mortality, in which a dependent variable follows an exponential distribution, which may not be appropriate for potential mortality patterns. Other probability distributions such as negative binomial distribution, binomial distribution and Poisson distribution are commonly used for mortality modeling with better results (Zhang et al. 2014).

When sample plots have little or no dead trees, a large amount of zero data may be possible, and the structure of this data is discrete (Eid and Tuhus 2001). The Poisson model is often used for counting; however, the Poisson regression must have equality of the mean and the variance, so the negative binomial model is sometimes used for counting (Rashid 2016; Zhang et al. 2017a). However, in practical situations, some data are too discrete, and the negative binomial model is not suitable. Sometimes, if the model is implemented, interpretation of the results can distort (Ping et al. 2008). In this situation, researchers have applied the zero-inflated model and Hurdle model to fit the mortality data because these methods can effectively solve the heterogeneity problems of the data (Hu et al. 2011; Yang 2014; Fang et al. 2016). Bayesian estimation methods have also been used (Alspach and Sorenson 1972) for mortality modeling. In forestry, counting models are mainly used for counting forest fire incidents (Kwak et al. 2012; Xiao et al. 2015; Susaeta et al. 2016) and rarely applied to tree mortality modeling (Affleck 2006; Li et al. 2019; Zhang et al. 2014). A two-step method has also been used to build stand-level mortality models (Woollons 1998; Eid and Oyen 2003) and involves fitting a logistic function to tree mortality data, and stand-level mortality is obtained by summing the number of dead trees in the stand. Because detailed information on individual trees is needed, with the chance of errors accumulating and thus reducing the accuracy of the stand-level mortality information.

Considering all these issues, here we used seven commonly used counting model (Poisson model and negative binomial model, zero-inflated Poisson model, zero-inflated negative binomial model, Hurdle Poisson model, Hurdle negative binomial model, and logistic regression model) to fit the tree mortality data. Considering different candidate models to fit data provides a good opportunity to select the most suitable model according to data patterns. The best-fitted model was then selected to describe the phenomenon of the regional random death of Larix. The presented mortality models will be useful for estimating comprehensive growth processes for Larix forests in northern China for developing more effective silvicutural strategies and forest management plans.

Materials and methods

1Data collection

We established 102 temporary sample plots (TSPs) in state-owned Larix forests in Shanxi Province, China to collect mortality data (Fig. 1): 67 in the Guandi Mountain Forest and 35 in the Boqiang Forest. These 102 TSPs did not have any have obvious damage due to disease and pest. Each TSP was square-shaped and 0.04 ha. The TSPs were selected to provide representative information for a variety of stand structures and densities, tree heights and ages, and site productivity. Data was collected from July through September in 2015. For each stand structure, stand origin was recorded, stand age was determined, and canopy density, and height of Larix with DBH larger than 5 cm were measured. All 102 TSPs originated from natural forests. Tree height was measured with an ultrasonic altimeter; the crown was measured in four directions using a hand-held laser range finder; the age of each dominant tree was determined by counting rings obtained from cores drilled at breast height the dominant height of the stand was obtained as an average of the five tallest trees in each quadrat within the sample plot. Summary statistics are presented in Table 1. The climate in the studied area is temperate continental. In Guandi Mountain Forest, mean annual temperate ranges from 3 °C to 7 °C, and mean annual rainfall is 822.6 mm. In Boqiang Forest, mean annual temperate ranges from ‒1 °C to 8 °C, and mean annual rainfall is about 400 mm.

Fig. 1
figure 1

Study area showing the sample plot locations

Table 1 Summary of stand-level variables; D: stand mean diameter; N: stand density (number of stems per hectare), DH: stand dominant height; S: basal area per hectare; and RSI: relative spacing index; MC: stand-level mortality in a sample plot; SD: standard deviation

Methods

We mainly choose stand factors to assess their affected on tree mortality. Stand factors include stand density, competition index, stand productivity, stand structure, etc. In most cases, these factors may be considered simultaneously or several of them are considered (Affleck 2006; Zhang et al. 2014; Das and Nathan 2015). Based on the research data, the main factors affecting stand level mortality were calculated, including stand density, stand mean diameter, stand dominant height, basal area per hectare, relative spacing index. Tree mortality patterns are shown in Fig. 2.

Fig. 2
figure 2

Tree mortality distribution patterns by number of dead trees per sample plot

Variable selection

Stand variables characterized by a meaningful biological explanation were selected as predictor variables in the tree mortality models. The dominant height (DH), which describes the combined effects of stand development and site productivity calculated and evaluated its potential contribution to the tree mortality models. Similarly, sample plot mean diameter (D), number of trees (N) and basal area per hectare (S), and relative spacing index (RSI), which were assumed to describe stand density and competition, were also evaluated for their potential contributions to the tree mortality variations. Multicollinearity among the independent variables was verified with the variance inflation factor (VIF). According to a common rule-of-thumb, multicollinearity among variables was considered to occur when VIF > 5 (Akinwande et al. 2015). Thus, the variance inflation factor (VIF) was used to examine whether variables would be significantly correlated with each other, and variables with VIF < 5 were retained in our final models. We retained only three stand-level predictor variables in our tree mortality models, and they are DH, D, and S.

Model development

We considered seven commonly used versatile functions to develop the tree mortality models, such as Poisson model and negative binomial model (NB), which refer to as the standard function, zero-inflated Poisson model (ZIP), zero-inflated negative binomial model (ZINB), Hurdle Poisson model (HP), Hurdle negative binomial model (HNB), and logistic regression model. We expanded each of these functions through the inclusion of important stand-level variables (D, DH, S), random component and dummy variable. More details of the expanded models were given in Table 2.

Table 2 Summarized forms of mortality functions, which were expanded through inclusion of three predictor variables

When a dummy variable describing regional variations in tree mortality was added to parameter β1 in all the seven models, dummy variable tree mortality models were formed (Table 3).

Table 3 Tree mortality models with dummy variable

We formulated the tree mortality models using each of the seven base models by incorporating dummy variable describing regional mortality variation and random effects accounting for forest block effects in the tree mortality models. The mixed-effects tree mortality models with dummy variables we formulated are given in Table 4.

Table 4 Mixed-effects models with dummy variable and random effects

In all the models in Table 4, the vectors of errors and block-level random effects (ui1, ui2) are defined by ζi ~ N(0, R) and μi ~ N(0, D), respectively, meaning that error vector is assumed to have a normal distribution with zero mean and within-block variance–covariance matrix Ri, defined by Eq. 22.

$$R_{i} = \sigma^{2} G_{i}^{0.5} \tau_{i} G_{i}^{0.5}$$
(22)

Vector μi of the random effects (ui1, ui2)in these models (Eqs. 15–21) was assumed to have a multivariate normal distribution with zero mean and block variance–covariance matrix D, defined by Eq. 23.

$$D\,=\,\left[ {\begin{array}{*{20}c} {\sigma_{{\mu_{i1} }}^{2} } & {\sigma_{{\mu_{i1} \mu_{i2} }}^{2} } \\ {\sigma_{{\mu_{i2} \mu_{i1} }}^{2} } & {\sigma_{{\mu_{i2} }}^{2} } \\ \end{array} } \right]$$
(23)

In this study, all parameter vectors can be estimated through the maximum likelihood method. Parameter estimation was implemented using the glmmTMB package (Brooks et al. 2017) in R 3.6.3 (R Core Team 2020).

Model selection and goodness of fit

Various statistical indicators were used to compare the fitting performance of the candidate models presented above. To contrast the goodness of fits between these models, coefficient of determination (R2), mean residual error (MD), total relative error (TRE), and root mean square error (RMSE) were used. The expressions of these indicators are given below:

$$R^{2} = 1 - \frac{{\sum\nolimits_{i = 1}^{i = n} {({\text{MC}}_{{\text{i}}} - \widehat{{{\text{MC}}_{{\text{i}}} }})^{2} } }}{{\sum\nolimits_{i = 1}^{i = n} {({\text{MC}}_{{\text{i}}} - \overline{{{\text{MC}}_{{\text{i}}} }} )^{2} } }}$$
(24)
$${\text{MD = }}\frac{1}{n}\sum\limits_{i = 1}^{i = n} {({\text{MC}}_{{\text{i}}} - } \widehat{{{\text{MC}}_{{\text{i}}} }})$$
(25)
$${\text{TRE = }}{{\sum\limits_{i = 1}^{n} {\left| {{\text{MC}}_{{\text{i}}} - \widehat{{{\text{MC}}_{{\text{i}}} }}} \right|} } \mathord{\left/ {\vphantom {{\sum\limits_{i = 1}^{n} {\left| {{\text{MC}}_{{\text{i}}} - \widehat{{{\text{MC}}_{{\text{i}}} }}} \right|} } {\sum\limits_{i = 1}^{n} {\widehat{{{\text{MC}}_{{\text{i}}} }}} }}} \right. \kern-\nulldelimiterspace} {\sum\limits_{i = 1}^{n} {\widehat{{{\text{MC}}_{{\text{i}}} }}} }}$$
(26)
$${\text{RMSE = }}\sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {({\text{MC}}_{{\text{i}}} - \widehat{{{\text{MC}}_{{\text{i}}} }})^{2} } }$$
(27)

where, n is the number of sample plots, MCi is the actual value of the tree mortality in the i th sample plot, \(\widehat{MC}\) is the estimated value of the tree mortality in the i th sample plot, and \(\stackrel{-}{\mathrm{MC}}\) is the average of the observed value of the tree mortality. The smaller the MD, TRE, RMSE, and the larger R2, the better is the fit performance of the models.

Using MD, TRE, RMSE and R2 alone does not ensure whether the models fitted data optimally. The validity of the tree mortality models developed from the seven different base models can be evaluated using an independent data set. However, such a validation procedure was not feasible in this study because of the limited availability of data. Instead, the predictive performance of the tree mortality models was evaluated using the leave-one-out cross-validation (LOOCV) approach (Nord-Larsen et al. 2009; Timilsina and Staudhammer 2013).

Results

Basic tree mortality models

The parameter estimates and fit statistics of all the seven candidate models using three stand-level variables (DH, D, S) as predictors, which we have defined as basic tree mortality models, are presented in Table 5, and their models in Table 2.

Table 5 Parameter estimates and fit statistics of all the base models. R2: coefficient of determination; RMSE: root mean square error; TRE: total relative error

All the parameter estimates for each base model were significant at the 0.05 level. Model 3 and Model 5 showed better-fit statistics compared to the other models. Model 1 and Model 2 were inferior compared to the other models. The complex models provided better fits than the simpler ones did. These results also confirmed the superiority of the complex models or discrete data. Model 3 and Model 5 provided better fits than all other models did, suggesting that they were the best suited to the data structure and the sample plots with no mortality data (zero data).

Tree mortality models with dummy variable

When a dummy variable describing regional variations in tree mortality was added to parameter β1 in the seven models, fit statistics obtained were substantially better than those of their base model counterparts (Table 2) (see model forms in Table 3).

The parameter estimates of the dummy variable and all other parameters of each tree mortality model were significant (p < 0.05), except for β4 and β5 in the NB, ZINB and HNB models (Table 6). Model 10 and Model 12 provided a better fit than all the other models, with the greatest R2 and smallest RMSE and TRE. Model 9 was inferior to the other models, with the smallest R2 and greatest RMSE and TRE.

Table 6 Parameter estimates and fit statistics of tree mortality models with dummy variable included

Mixed-effects models with dummy variable and random effect

We formulated the tree mortality models using each of the seven base models by incorporating a dummy variable describing regional mortality variation and random effects accounting for forest block effects into the tree mortality models (see model forms in Table 4).

Except for Model 18, which did not converge with global minimum, the fit statistics of all the other mixed-effects dummy variable models were significantly improved, and all the parameter estimates of each model were significant (Table 7). Model 15 fitted better than all the other models, with the greatest R2, and the smallest RMSE and TRE. Model 16 showed an inferior fitting to other models.

Table 7 Parameter estimates and fit statistics of the mixed-effects models with dummy variable and random effects included

Model evaluation with LOOCV

We used only those predictor variables in the tree mortality models, which had VIF < 5, to insure no collinearity occurred among them. We used the selected variables in all model types: basic models, dummy variable models, and mixed-effects dummy variable models. We evaluated all these model types using the LOOCV approach. The prediction improvement was substantial through adding the dummy variable and random effects to the basic models (Table 8). For Model 19, R2 is the largest and RMSE is 7.1% lower than that of Model 15. Among the basic models (Eqs. 1–7), Model 3 and Model 5 had the most attractive prediction statistics. Model 10 and Model 12, which are the dummy variable models and the mixed-effects dummy variable model, Model 19, appeared to be the best in their prediction performance.

Table 8 Prediction statistics obtained with the leave-one-out cross-validation (LOOCV) approach

When we compared the observed and predicted tree mortality distribution patterns (Fig. 3), base Model 5, dummy variable Model 12, and mixed-effects Model 19 showed better fitting effects.

Fig. 3
figure 3

Observed and predicted distributions of tree mortality for Larix gmelinii subsp. principis-rupprechtii

Discussion

In this study, seven different mortality functions were considered for fitting the mortality data collected from two different regions of northern China, and their fitting performances were evaluated using common statistical measures.

Sample plot mean diameter (D) and stand basal area (S), which reflect the stand diameter growth, may describe the morality caused by stand density and competition. Dominant height (DH) may reflect the combined effects of site quality and stand development on the tree mortality.

Stand variables S and D can be calculated simply and accurately using diameter at breast height, which was the most reliably measurable variable in field survey data. In our models, tree mortality is significantly related to S, DH and D in the sample plot. Variable S had a positive correlation with tree mortality, indicating that S raised the tree mortality rate, which may be due to resource limitations in the stand. An increase in basal area per hectare may cause crowding, and the intense competition may increase the mortality rate in the forest (Dieguezaranda et al. 2005; Wiegand et al. 2006; Moustakas et al. 2008; Zhang et al. 2015). DH was also positively correlated with tree mortality, and with larger DH, tree mortality increased. Site conditions differ among regions and may be responsible for differences in tree mortality between the two regions with trees of the same dominant height.

Conversely, the effect of D on tree mortality was negative; that is, as the stand mean diameter became smaller, the tree mortality increased. This result indicates that tree mortality was more likely in forests with many small trees compared to forests with larger trees (Juknys et al. 2006; Larson and Franklin 2010).

When the dummy variable accounting for mortality variations due to regional differences was added, the model fit statistics slightly improved. However, basal area per hectare (S) appeared insignificant in the dummy variable models (negative binomial model (NB), zero-inflated negative binomial model (ZINB), and Hurdle negative binomial model (HNB)). The reason may be due differences in regions, such as Guandi Mountain and Wutai Mountain. In the different regions, D would be significantly different due to different sites in the two regions. However, the models had higher fitting accuracy when tree mortality models including the dummy variables and random effects were considered based on the basic models. Because the random effects were added as D and the intercept, the differences wre explained as the effects of stand mean diameter in different blocks.

For discrete data, the Poisson model and the negative binomial model have poor prediction accuracy in the basic model, while the zero-inflated Poisson model (ZIP) and Hurdle Poisson model (HP) had unique advantages for prediction. The prediction accuracy of the zero-inflated negative binomial model is lower than that of the zero-inflated Poisson model, which may be due to the numerous zero data that would not be in the expansion state (Long and Freese 2006).

Compared to the basic models, parameters of dummy variable models increased to a certain extent, and models index have improved. The dummy variable models can well integrate different areas of Larix stands, improve the accuracy of the tree mortality model, and expand the compatibility of the model.

If the logistic model is based on the maximum likelihood estimation, it can only estimate the probability of the dependent variable. Thus, it is not appropriate to use maximum likelihood estimation in our study. Any of the maximum likelihood-based criteria for model selection, such as the Akaike information criterion (Strawderman et al. 2000) cannot be applied to the logistic model. Alternatively, the leave-one-out cross-validation (LOOCV) was applied to evaluate prediction performance of the models.

The Poisson model performed well in principle with tree mortality, but was unable to account for the large zero fraction; the zero-inflated negative binomial, zero-inflated Poisson, and Hurdle negative binomial models overestimated the count part, but underestimated the zero part, resulting in a low prediction accuracy. Because of the complexity of tree mortality, it is difficult to interpret the fitted mortality function when the zero part was added. However, when the dummy variable (region) and random effects (block) were included into each of the seven base models, the prediction accuracy shown by LOOCV significantly improved, suggesting that there were significant variations in tree mortality caused by regional conditions and subject (forest block). This result justifies applying the mixed-effects dummy variable modeling approach in our study.

According to the field survey, altitude differenced inthe block is large, which is expected to greatly influence mortality of Larix species. However, we did not include any site variables such as altitude, aspect and slope into our models. In the future, these variables need to be considered in mortality modeling. Similarly, climate change also contributes to tree mortality on a large scale (Mantgem and Stephenson 2007; Kurz et al. 2008; Allen et al. 2010; Yang 2014; Hartmann et al. 2018), so climatic factors also need to incorporated into tree mortality models.

Conclusions

Fitting and comparison of the seven basic models through incorporation of the dummy variable describing regional effects and random components describing the forest block effects on the tree mortality led us to the following conclusions:

  1. (1)

    The models fitted with the dummy variable and random effects significantly improved fit statistics and prediction statistics compared with the basic models.

  2. (2)

    Among the various model formulations (basic, dummy, mixed models), the random effects for the Hurdle Poisson model described the largest variation in the tree mortality.

  3. (3)

    Tree mortality was significantly positively correlated with stand basal area and stand dominant height, but negatively correlated with sample plot mean diameter.

  4. (4)

    The models only considered mortality that was caused by competition; however, the impact of different regional climate scenarios on tree death is not yet clear and needs to be studied in the future.