Introduction

The benefits or services that forest ecosystems provide to people are often divided into social, ecological and economic categories, or social, environmental and economic aspects (e.g., Western et al. 2017). Sustainable forest management means that forestry should be sustainable in all these respects. Sometimes, cultural sustainability is mentioned as the fourth aspect, or social and cultural aspects are combined into the socio-cultural dimension of forest management. Cultural sustainability requires, for example, that forest management should correspond to people’s perception about the right way to manage forests (Shindler and Brunson 2004).

Economic sustainability usually refers to timber production and incomes obtained from timber sales. It requires that the current use of forests should not decrease future harvest levels, and the forest’s capability to generate economic returns is maintained. Environmental sustainability implies a non-decreasing flow of environmental, regulative and protective benefits, and ecological sustainability means that forestry should not jeopardize the viability of the populations of forest-dwelling species. Social sustainability has many definitions (Magee et al. 2013) but in the forestry context it often means respecting the customary uses of forests, among other things.

Kappen et al. (2020) classified the forest “values” into four dimensions: (1) climate-regulatory, (2) environmental, (3) commercial, and (4) social. They concluded that the climate-regulatory function is globally by far the most important. Baskent (2020) classified the ecosystem services as (1) provisioning, (2) regulating, (3) supporting, and (4) cultural services. Mikkilä et al. (2005) developed a hierarchy for the acceptability analysis of forest industries where the main criteria (called dimensions) were (1) economic/financial/technical, (2) environmental, (3) social, and (4) cultural/political.

The above three to five categories of forest benefits are not necessarily mutually exclusive. For example, the society benefits from timber production and the incomes that forestry generates to forest landowners. In fact, social benefit is defined to be the sum of private benefits and externalities (e.g., Sloman and Garratt 2010). In forests managed primarily for timber production, the social benefit is the sum of timber benefits and the externalities of timber production. Following this line of thinking, Magee et al. (2013) did not consider social sustainability as a sub-category of overall sustainability but assumed that social sustainability consists of ecological, economic, cultural and political dimensions.

Whatever the classification, sustainability requires that forests should maintain all its functions. This calls for forest management that is balanced in the sense that one or a few forest functions should not be maximized with the detriment of the other functions. An increasingly important element of all aspects of sustainability is the resilience of forest ecosystems (Seidl and Lexer 2013; Gauthier et al. 2015). Resilience is a measure of the certainty of obtaining the ecosystem services in changing conditions, and the ability of the forest to resist disturbances and recover from them (Thompson et al. 2009; Messier et al. 2019). Messier et al. (2019) already anticipated the increasing importance of resilience when they classified the forest management objectives into six categories as follows: (1) timber and biomass, (2) biodiversity, (3) resilience, (4) carbon storage, (5) social acceptance, and (6) water quality. Emphasizing resilience, customary uses, biodiversity maintenance and environmental sustainability leads to the requirement for “strong sustainability” (e.g., Neumann et al. 2017) in forest management.

It is generally understood that mixed stands and structurally complex ecosystems are the most resistant and resilient (Knoke et al. 2008; Messier et al. 2019; Pardos et al. 2021). Messier et al. (2019) concluded that a resilient forest should have high species richness and high functional redundancy. The latter requirement means that there should be more than one species responsible for a certain forest function. This would guarantee that the forest would sustain its functions also in the case where one species disappears completely in a severe pest outbreak or due to an invasive pathogen (e.g., Möykkynen and Pukkala 2010; Möykkynen et al. 2017). Triviño et al. (2015), Messier et al. (2019) and Diaz-Yáñez et al. (2019), among others, concluded that a flexible use of different management approaches and silvicultural systems might be the best approach to guarantee a sustainable provision of all important ecosystem services of forests.

Many methods have been developed to assess the efficiency and performance of forest management. For example, data envelopment analysis and stochastic frontier analysis have been used to quantify the productive efficiency of forest management (Pukkala 2017; Lundmark et al. 2020). These metrics are related to the concept of Pareto optimality, which indicates the “technical” efficiency of production. The production is labelled efficient if it is not possible to increase any output or service without decreasing the production of at least one other service. When forest management plans are produced with optimization methods the plans are Pareto efficient in terms of those services that were considered in optimization.

The problem of Pareto efficiency is that management that is good in terms of one service but bad in several other services may be classified as efficient. Because of this, Pareto efficient forestry may not always be socially acceptable (Bennett et al. 2009). The concept of allocative efficiency has been developed to mitigate this problem (e.g., Susaeta et al. 2016). Allocative efficiency considers the preferences of forest users. Allocative efficiency implies that a service is produced up to the point where the marginal benefit of producing an additional unit equals to the marginal cost of increased production. A production that is Pareto efficient is not necessarily allocative efficient.

Eco-efficiency is another concept related to forest management (Angulo-Meza et al. 2019). Whereas productive efficiency maximizes the ratio between production and resource consumption, eco-efficiency aims at producing the services with minimal harmful environmental impacts (Angulo-Meza et al. 2019; Bianchi et al. 2020). A problem here is that the impacts of forest management cannot be straightforwardly classified into benefits and negative impacts. Besides, some variables calculated in forest planning are already differences between positive and negative impacts, or outputs and inputs. For example, carbon balance is the difference between carbon sequestration and carbon releases, and net income is the difference between revenues and costs.

Although there is much research on the social performance of forest management (see e.g. Shindler and Brunson 2004), there is no metric available that could be used to compare alternative management options of Finnish forests in terms of their social performance or acceptability. Four questions should be considered when developing such metrics: (1) what are the indicators of social performance, (2) how should the indicators be normalized, (3) what are the weights of the indicators and (4) which method is used to compose the performance index from the normalized indicator values. Normalization refers to the removal of the effect of different units of the indicators, and dealing with their ranges.

The current study examined possibilities to measure the performance of forestry when the aim is to provide a balanced and sustainable supply of ecosystem services. Such an assessment system was targeted where the performance measure can be calculated automatically, i.e., without the need for a case-specific preference analysis. The criteria used in the analyses were economic, “socio-cultural”, ecological, environmental and resilience where the socio-cultural dimension refers to the customary uses of forests. Each criterion was measured with three numerical indicators, which can be calculated in the context of forest management planning. The methods that were used to synthesize the values of the indicators into a single performance index were adopted from multi-criteria analysis (Velasquez and Hester 2013; Kangas et al. 2015).

All five categories of performance indicators were assumed necessary, i.e., metrics that contribute to a balanced provision of different ecosystem services were targeted. Metrics that allow a full substitution were not regarded acceptable. Substitution means that a failure in a certain indicator can be compensated for by a good performance in another indicator. An example of metrics that allow full substitution is the additive utility function (Kangas et al. 2015). The study analyzed the similarity of the rankings of alternative management plans when different metrics, calculated from the same indicators, were used. If the ranking is sensitive to the metric, the choice of the metric is an important question in the analysis of the social performance of forest management.

The ranking methods were compared in two case study forests, one representing the southern part of the boreal zone and the other representing northern boreal forest. The management plans for which the performance metrics were calculated represented different silvicultural systems, for instance, even-aged rotation forestry and continuous cover forestry. Therefore, the study also provided information about the social performance of alternative silvicultural systems.

Materials and methods

Materials

Data from two large forest holdings were used in the analyses, one representing southern Finland and the other representing the northern parts of the country (Table 1). The stands of the case study forests were surveyed using relascope inventory. The growing stock variables were measured separately for different tree species and canopy layers. The field inventory was conducted carefully in the sense that also the minor tree species, with little economic importance, were measured. If the tree canopy was two- or multi-storied, all growing stock variables were assessed separately for each canopy layer. The growing stock variables assessed for each stratum were: basal area (or, in young stands, number of trees per hectare), mean height, mean age and mean diameter.

Table 1 Information about the case study forest holdings

Mesic site, representing average fertility, was the most common site type in both areas (Table 1). The proportion of the most fertile sites (mesotrophic and herb-rich) was higher in the southern forest. The three poorest site classes, sub-xeric, xeric and barren heath, on which only pine grows well, covered 47% of the northern forest and 21% of the southern forest. Scots pine (Pinus sylvestris L.) accounted for most of the growing stock volume of the northern forest. It was the most common tree species also in the southern forest, but the species distribution was more uniform in the south (Table 1). The other major tree species were Norway spruce (Picea abies (L.) H. Karst.), silver birch (Betula pendula Roth), downy birch (B. pubescens Ehrh.) and aspen (Populus tremula L.). All stands represented mineral soil sites.

Management scenarios

Different management plans were developed for both case study forests in two steps. First, alternative treatment schedules were simulated for the stands for 100 years. The 100-year simulation period was divided into ten 10-year periods, and treatments were simulated in the middle of the period. Second, the best treatment schedules were selected for the stands assuming that the landowner maximizes the net present value of forest management. This selection was done three times using a 1%, 2% or 4% (real) discount rate.

The treatment schedules were simulated under four alternative silvicultural systems. The first system was conifer-oriented even-aged management (referred to as CON). It represented the forest management that has been widely practiced in Finland during the past few decades (Pukkala 2017; Diaz-Yáñez et al. 2019). The stands were regenerated using clear-felling and artificial regeneration except the two poorest sites (xeric and barren heath) where natural regeneration for pine was used. The treatments that followed clear-felling were: cleaning the site from any remnant trees (non-merchantable small trees), mechanical site preparation, planting or seeding, weed control on the most fertile sites, and one or two pre-commercial thinnings. Sowing of pine seeds was used in sub-xeric sites, planting of spruce on mesic and herb-rich sites, and planting of silver birch on mesotrophic sites. The artificially regenerated young seedling stands were cleaned from natural regeneration after a few years since planting or sowing, resulting in monocultures of silver birch, spruce of pine.

Thinning treatments were simulated as thinning from below. However, if the stand was two- or multi-storied, the cutting was simulated as the removal of the upper story whenever the advance regeneration was sufficient according to the criteria of the current forestry regulations (Äijälä et al. 2014). A thinning was simulated when the stand basal area reached the thinning limit of the silvicultural instructions (Äijälä et al. 2014) and a clear-felling or seed tree cut was simulated when the mean diameter of trees exceeded the required minimum of the instruction. The lower limits of the ranges of recommended thinning basal area or final felling diameter were used as the earliest moment of these cuttings. Alternative treatment schedules were simulated by postponing the cuttings by one or several 10-year periods.

The second silvicultural system (referred to as MIX) was also even-aged forestry but it aimed at maintaining mixed forests and increasing the share of broadleaf tree species. Similar to Pukkala (2018a), the commercial thinnings and the pre-commercial thinnings of young stands were simulated in such a way that the thinning intensity (percentage of removed trees) was proportional to the share of the species of stand basal area (commercial thinning) or number of trees per hectare (pre-commercial thinning). Another difference was that the planted tree species was varied on herb-rich and mesic sites. Spruce was planted on clear-felled herb-rich sites with 70% probability and silver birch with 30% probability. On the mesic site, the probabilities to plant pine, spruce and silver birch were all 33.3%. Otherwise, the simulation was similar as in the conifer-oriented silvicultural system.

The third silvicultural system was continuous cover forestry (CCF) where final felling and artificial regeneration were not used. All thinnings were thinning from above, which were simulated according to instructions developed for continuous cover forestry (Pukkala 2017; Diaz-Yáñez et al. 2019).

The fourth management system was any-aged forestry (AAF) where the management was not categorized into even-aged or continuous cover management. Simulation was based on the three rules developed by Pukkala (2018b). The first rule indicated whether the stand was financially mature for cutting. If this was the case, another rule was used to find out whether the cutting should be final felling or thinning. If it was thinning, the third rule was consulted to obtain the thinning intensities for different diameter classes. These rules are based on a high number of optimizations where the net present value of forest management was maximized with different discount rates (Pukkala 2018b). When the rules are followed, the most common cutting is thinning from above (Diaz-Yáñez et al. 2019). Final fellings are simulated only for stands where almost all trees are financially mature and the amount of natural advance regeneration is low.

All these silvicultural scenarios have been described in more detail in previous studies (Pukkala 2017; 2018a; Diaz-Yáñez et al. 2019; Heinonen et al. 2020). These studies also explain the methods that were used to obtain several alternative treatment schedules for each stand. The basic method used in even-aged management scenarios was to postpone cuttings from their earliest allowed moment. In the CCF and AAF scenarios, alternatives were produced by applying the simulation instruction with different discount rates. The total number of treatment schedules simulated for the 478 stands (426.3 ha) of the southern forest was 11,590 in the CON scenario, 11,645 in MIX, 8631 in CCF and 8745 in AAF. In the northern forest (493.9 ha, 519 stands), the corresponding numbers were 10,695 (CON), 11,216 (MIX), 9231 (CCF) and 10,416 (AAF). One of the schedules simulated for every stand was a no-cutting schedule.

The treatment schedules were simulated subject to a few restrictions. The minimum harvest removal was 30 m3 ha−1. If the removal would have been smaller, the treatment was not simulated. In the southern forest, the minimum post-cutting basal area of the CCF and AAF scenarios was 13 (mesotrophic site), 12 (herb-rich), 11 (mesic), 9 (sub-xeric), 8 (xeric), or 7 (barren heath) m2 ha−1. In the northern forest, the minimum basal areas of these site fertility classes were 11, 10, 9, 7, 6, and 6 m2 ha−1, respectively. No minimum basal areas were set for even-aged forestry (CON and MIX) since the management recommendations used in the simulation (Äijälä et al. 2014) prevented too strong basal area reductions or final felling at loo low mean diameter.

Models used simulation

Stand development was simulated on an individual-tree basis. The models of Pukkala et al. (2021) for diameter increment, survival and ingrowth were used to simulate the stand dynamics. Tree height was calculated with the models of Pukkala et al. (2009) and assortment volumes with the taper functions of Laasasenaho (1982). The timber assortments and their roadside prices were the same as in Heinonen et al. (2020). Tree biomasses were required in carbon stock and carbon balance calculations and they were calculated using the biomass functions of Repola et al. (2007) and Repola (2009).

A tree breeding benefit of 10% was assumed for all planted trees (for trees panted during the 100-year simulation period) and a 5% benefit was assumed in sowing. The breeding effect was implemented by multiplying the predicted diameter increment by 1.1 (planting) or 1.05 (sowing). Moreover, the diameter increment was assumed to be 10% smaller than the model prediction for the first five-year period after all partial cuttings of the CCF and AAF scenarios. This was done to mimic the thinning stress of the remaining trees after removing the largest trees of the stand (Hynynen et al. 2019). It was assumed that a part of the advance regeneration is damaged in a partial cutting. The rate of damage was directly proportional to removed stem wood volume per hectare.

Harvesting costs were based on the time consumption functions of Rummukainen et al. (1995) for harvesters and forwarders, and the current hourly costs of these machines (Diaz-Yáñez et al. 2020). Rummukainen et al. (1995) developed separate functions for final felling and partial cuttings. The functions take into account the slightly slower harvesting and forwarding in partial cuttings. The net return from cutting was obtained as the difference between the roadside values of harvested trees and the harvesting costs. The costs of silvicultural treatments were the same as those used in Diaz-Yáñez et al. (2019).

Alternative plans

Three management plans were composed for each silvicultural system. It was assumed that the forest landowner always maximizes economic profitability but the discount rate of the owner may vary. The net present value was maximized with a 1, 2 or 4% discount rate. The higher the discount rate, the earlier the cuttings are conducted, and the lower are the average growing stock volumes. In the case study forests of this study, a 2% rate maintained the average growing stock volume of the forest approximately at its initial level. A 1% discount rate leads to management where cuttings are late and growing stock volumes increase.

Heinonen et al. (2020) called landowners who use a 1% discount rate as “savers”. Landowners who use a high discount rate were called “investors”. In the current study, the highest discount rate was 4% whereas in Heinonen et al. (2020) the investor’s discount rate was 5%. However, discount rates higher than 4% would lead to almost similar forest management as 4% since the minimum post-thinning basal areas and minimum regeneration diameters would prevent the heavy cuttings and low stand densities that would otherwise be optimal in unconstrained maximization of NPV with discount rates higher than 4%.

Performance indicators

Fifteen performance indicators representing five dimensions of forest management (economic, socio-cultural, environmental, resilience, ecological) were calculated for each plan. The economic dimension was described with the net present value of timber production (NPV), mean annual sawlog harvest and mean annual pulpwood harvest. NPV describes the economic benefit of the forest landowner and the other two indicators describe the timber supply to sawmills and pulp and paper industries.

The socio-cultural dimension was described with the mean annual berry yield, mean annual mushroom yield and the mean scenic beauty index of the forest. These indicators are related to the most common customary uses of Finnish forests, which are berry and mushroom picking, and outdoor recreation. Berry yield estimates comprised bilberry (Vaccinium myrtillus) and lingonberry (Vaccinium vitis-idea), the yields of which were calculated with the models of Kurttila et al. (2018). The predicted mushroom yields concern commercial mushrooms, which are largely the same as collected for households. The models of Kurttila et al. (2018) were used to calculate the mushroom yield estimates. Although the yield estimates do not describe the amount of berries and mushrooms that are actually collected, they correlate closely with the harvested amounts on these non-wood products. The scenic beauty estimates were calculated with the model of Pukkala et al. (1988). All indicators were calculated for each stand at the end of each 10-year period. The forest-level value was the area-weighted mean of the stand values. The stand value was the average of the estimates calculated for the 10-year periods.

The environmental dimension was described with the carbon balance of the 100-year simulation period, average carbon stock of the forest, and average reflectance (albedo) of the forest canopy. Carbon balance was the difference between the carbon sequestration and carbon releases of forestry. It consisted of three sub-balances: living growing stock, dead organic matter and wood products. The carbon balance of a certain period was equal to net change in carbon stock. However, the carbon balance of wood products also included the carbon releases of harvesting, transport and product manufacturing and the avoided fossil emissions due to the use of wood (substitution effects). The carbon stocks of the forest included the carbon of living tree biomass and forest soils. The carbon balance and carbon stock were calculated in the same way as explained in detail in a recent study (Pukkala 2020). Albedo described the reflectance of the forest canopy in August, and it was calculated in the same way as in Pukkala (2018a).

The indicators of resilience were based on the assumption that a high species and size diversity of trees is good for resilience (Thompson et al. 2009; Jactel et al. 2017; Messier et al. 2019). The Shannon index was used to describe species diversity. It was calculated from the proportions of different tree species of stand basal area. Size diversity was described with the Gini index (Gini 1921), following the recommendation of Valbuena et al. (2012). These indices were calculated for each stand at the end of each 10 years. The forest level indicator was the area-weighted mean of the temporal averages of stand indices.

The third indicator of resilience was the standard deviation of the mean heights of stands. The choice of this variable was based on the fact that, in Finnish forests, wind throws are the most probable starting points of bark beetle outbreaks (for instance Ips typographus), and most wind throws occur at stand edges (Heinonen et al. 2009). Therefore, the best way to reduce the likelihood of pest outbreaks is to minimize vulnerable edges, which can be done by minimizing the height differences between adjacent stands (Heinonen et al. 2009). The standard deviation of mean heights was calculated for the last year of each 10-year period, after which the average of the 10 standard deviations was calculated.

The biodiversity indicators used in this study described the availability of resources that are often considered to limit the biodiversity of Finnish forests, namely the amount of deadwood, and the volume of broadleaf trees (Díaz-Yáñez et al. 2019). Aspen (Populus tremula) is considered to be especially valuable for biodiversity (Díaz-Yáñez et al. 2019) and its volume was therefore taken as a separate indicator. The volume of other broadleaf species was another indicator. The amount of deadwood was calculated by simulating the decomposition of the stem of each tree that died during the 100-year simulation period. Only trees larger than 10 cm were included in the dry mass of deadwood since the resource that most often limits biodiversity is large-sized deadwood (Tikkanen et al. 2007; Zubizarreta et al. 2019). The volume of aspen and broadleaves and the dry mass of deadwood were calculated for 10 time points, of which a 100-year average was obtained. Then the area-weighted average of all stands was calculated.

Ranking methods

The 15 indicators of the five dimensions of sustainable forestry were used to calculate the overall performance score for the 12 forest management plans (CON, MIX, CCF, AAF, all with three discount rates). A no-cutting plan was included as the 13th alternative as it helped to draw conclusions about the alternative ways to calculate the performance index.

The first performance index, abbreviated as MAA, was based on the multi-attribute approval voting (Kangas et al. 2015). A certain plan was given one point from each indicator in which it was better than the average.

The second index was Borda count (abbreviated as Borda), which is a variant of preference voting (Kangas et al. 2015). The 13 plans were ordered according to each criterion. Then, the best plan in a certain indicator was given 13 points, the second-best 12 points, etc. After doing this for all 15 indicators, the sum of the points, i.e. the Borda count, was calculated as the sum of points obtained from different indicators.

The third performance index (Ranks) was based on reciprocal rankings of the plans according to different indicators (Kangas et al. 2015). The best plan was given 1/1 points, the second-best 1/2 points, etc. The formula was as follows:

$$P_{i}^{{{\text{Rank}}}} = \mathop \sum \limits_{k = 1}^{15} 1/R_{ik}$$
(1)

where Rik is the rank of plan i according to indicator k.

The fourth and fifth indices (rSMAA1 and rSMAA3) were based on the reciprocal ranks (Eq. 1) and the principles of stochastic multi-criteria acceptability analysis (Lahdelma 1998; Kangas et al. 2003). First, random weights (wk) were generated for the indicators. Second, the weighted sum of the reciprocal ranks was calculated as:

$$P_{i}^{{{\text{SMAA}}}} = \mathop \sum \limits_{k = 1}^{15} w_{k} \frac{1}{R}_{ik}$$
(2)

This was repeated with 10,000 combinations of random weights. Based on these 10,000 total scores, the probability to be the best (rSMAA1) or among the three best management plans (rSMAA3) was calculated for each plan.

The sixth performance index (ComPro) was based on compromise programming (Diaz-Balteiro et al. 2011). The performance index was computed as (Poff et al. 2010):

$$P_{i}^{{{\text{ComPro}}}} = \left[ {\mathop \sum \limits_{k = 1}^{15} \left( {\frac{{q_{k}^{{{\text{Max}}}} - q_{ik} }}{{q_{k}^{{{\text{Max}}}} - q_{k}^{{{\text{Min}}}} }}} \right)^{a} } \right]^{\frac{1}{a}}$$
(3)

where qkMin is the minimum and qkMax is the maximum (“ideal”) value of indicator k, qik is the value of indicator k in management plan i, and a is a parameter, which was set equal to one. If a low value of the indicator was targeted, which was the case for the height variation between stands, the relative deviation from the ideal value was calculated as \(\left( {\frac{{q_{ik} - q_{k}^{{{\text{Min}}}} }}{{q_{k}^{{{\text{Max}}}} - q_{k}^{{{\text{Min}}}} }}} \right)\). The ideal value was the largest possible value (qMax) for all indicators except the standard deviation of the mean height of stands, where the ideal value was the lowest possible value (qMin). The lowest possible value of an indicator was assumed to be 0.9 times the smallest value among the 13 plans, and the largest possible value was 1.1 times the largest value among the 13 plans.

The seventh performance index (Utility) was computed from the multiplicative Cobb–Douglas utility function (Douglas 1976):

$$P_{i}^{{{\text{Utility}}}} = \mathop \prod \limits_{k = 1}^{15} p_{ik}^{{w_{k} }}$$
(4)

where pik is the score that plan i gets from indicator k and wk is the weight of indicator k. All 15 indicators were assumed to be equally important (wk = 1 for all k). Equation 4 is also called conjunctive utility function (Kangas et al. 2015). The original values of the indicator variables were normalized in the same way as in the compromise programming approach:

$$p_{ik} = \frac{{q_{k}^{{{\text{Max}}}} - q_{ik} }}{{q_{k}^{{{\text{Max}}}} - q_{k}^{{{\text{Min}}}} }}$$
(5)

when a high value of the indicator indicated good performance, and

$$p_{ik} = \frac{{q_{ik} - q_{k}^{{{\text{Min}}}} }}{{q_{k}^{{{\text{Max}}}} - q_{k}^{{{\text{Min}}}} }}$$
(6)

when a low value indicated good performance.

The last two performance indices (uSMAA1 and uSMAA3) were based on the idea of stochastic multi-attribute acceptability analysis (Lahdelma et al. 1998; Kangas et al. 2003). They were calculated so that 10,000 random combinations of indicator weights (wk in Eq. 4) were generated and the total score of each plan was calculated with all 10,000 combinations of random weights. The weights were drawn from uniform distribution, and they were scaled so that their average was equal to one. Figure 1 illustrates the effects of two random sets of 15 weights on the relationship between the indicator variable and the partial score. The 10,000 total scores calculated for each plan were used to calculate the probability that the plan is the best (uSMAA1) or among the three best plans (uSMAA3).

Fig. 1
figure 1

Effect of indicator weight (exponent of Eq. 4) on the contribution of the indicator to the performance index calculated with Eq. 4. The diagrams show the effects of the indicators on their partial utility for two random sets of indicator weights

Results

Values of indicators

When the net present value, calculated with a 3% discount rate, was used as the indicator of the overall profitability of forestry, the results suggest that CCF and AAF were more profitable than the even-aged management systems CON and MIX (Tables 2 and 3). Also the timber supply was higher in the CCF and AAF systems than in the even-aged systems. This result was obtained despite the 5% or 10% tree breeding benefit that was assumed in artificial regeneration. The 10% reduction of the diameter increment during the first 5 year period after thinning did not alter the yield ranking of the silvicultural systems either. A systematic result was that maximizing NPV with a high discount rate increased the share of pulpwood of the harvested timber. This is logical since a low discount rate means that the financial maturity for cutting is reached at a larger tree diameter, which decreases the share of pulpwood of harvested wood.

Table 2 Values of indicator variables in different silvicultural systems in the southern case study forest when net present value was maximized with a 1% or 4% discount rate or there were no cuttings
Table 3 Values of indicator variables in different silvicultural systems in the northern case study forest when net present value was maximized with a 1% or 4% discount rate or there were no cuttings

The berry yields were of the same magnitude in all silvicultural systems, most probably because final fellings often increase lingonberry yields but decrease bilberry yields (Kurttila et al. 2018). In the southern forest, the mushroom yields were about 10% higher in the CCF and AAF systems, compared to rotation forest management (CON and MIX). The mean scenic beauty index was lower in the conifer-oriented rotation forestry, as compared to the other management systems.

Both carbon balance and carbon stocks were clearly higher when NPV was maximized with a 1% rate, compared to the use of a 4% rate. A low discount rate led to high growing stock volumes, which was beneficial for the total carbon balance of forestry, which included also the substitution effects of harvested timber. Albedo (in August) was the highest in the management system that aimed at increasing the share of broadleaf species (MIX). The result is logical since broadleaf canopies have higher reflectance than conifer canopies. The use of a high discount rate, which led to lower stand densities, increased albedo. Most probably the increase would be larger during the snow period, since a dense canopy reduces the effect of snow on the reflectance, especially if the canopy consists of conifers.

The Shannon index, which describes species diversity, was much lower in the conifer-oriented rotation forestry than in the other management systems. CCF and AAF led to the highest species diversity. They also had the highest size diversity (Gini index). The third resilience indicator was the standard deviation of the mean height of the stands. A low value indicates a low risk of wind damage. Logically, the height variation was the largest in rotation forest management, making these systems less resistant against wind throws and more vulnerable to bark beetle attacks.

The most systematic result concerning the amount of deadwood was that maximizing NPV with a high discount rate decreased the amount of deadwood. The highest volumes of aspen were reached in even-aged management, probably because clear-felling increased the natural regeneration of aspen. However, the aspen volume was by far the highest in the no-cutting system, in which there was no natural regeneration of aspen on clear-felling sites. This means that the other management scenarios could maintain higher aspen volumes by leaving the existing aspens as retention trees. In the southern forest, the volume of broadleaf species other than aspen was the highest in rotation forestry that aimed at mixed forests. However, a clear margin to other silvicultural scenarios was obtained only when NPV was maximized with a low discount rate. In the northern case study forest, broadleaf volume was clearly lower in the conifer-oriented rotation forestry than in the other management systems.

The relative values of the indicator variables (Eqs. 5 and 6) in the no-cutting scenario and the four silvicultural systems are depicted visually in Fig. 2 when NPV was maximized with a 2% discount rate. The figure shows that the CCF and AAF scenarios were very similar, which is because most cuttings of the AAF scenario were thinning from above. The no-cutting scenario did not perform well in economic profit, timber supply and albedo but it was better than the others in the biodiversity indicators, carbon sequestration, and customary uses of forests. The AAF and CCF scenarios were better than the CON and MIX scenarios in all indicators that were used to describe resilience, namely Shannon index, Gini index, and between-stand height variation.

Fig. 2
figure 2

Normalized values of the indicators in the no-cutting plan and in the other management systems when net present value is maximized with a 2% discount rate. Wind resistance is equal to one minus normalized standard deviation of the mean heights of stands. CON = conifer-oriented even-aged management, MIX = mixed-stand-oriented even-aged management, CCF = continuous cover forestry, AAF = any-aged forestry. Profitability is equal to NPV calculated with a 3% discount rate

Aiming at mixed stands and an increasing presence of broadleaves improved the performance of rotation forest management in albedo, species diversity (Shannon), size diversity (Gini) and the volume of aspen and broadleaf. There was slight improvement also in mushroom yields and scenic beauty index. There was no deterioration in any indicator, not even in timber production or NPV although it is known that the wood production and economic return of broadleaf stands are lower than those in conifer stands (e.g. Pukkala 2018a). However, a mixture of broadleaves improves the growth and enhances the regeneration of conifers, especially spruce (Pukkala et al. 2021). These effects were taken into account in the dimeter increment and ingrowth models used in simulation, and they may explain the competitive economic performance and timber production of the MIX scenario, as compared to conifer-oriented management.

The effect of discount rate is visualized in Fig. 3. In rotation forest management, maximizing NPV with a 1% discount rate was better than 4% rate in mushroom yields, scenic beauty, carbon balance, carbon stocks and the amount of deadwood. The use of a high discount rate clearly increased the pulpwood harvest of both forests and the sawlog harvest of the southern forest.

Fig. 3
figure 3

Normalized values of the indicators in the conifer-oriented even-aged management (CON) and continuous cover forestry (CCF) when net present value is maximized with a 1% or 4% discount rate. Wind resistance is equal to one minus normalized standard deviation of the mean heights of stands

Continuous cover management was less sensitive to the discount rate, which might be related to the minimum allowed stand basal areas. The effect of discount rate was the strongest in carbon stock, carbon balance and the amount of deadwood, for which a low discount rate resulted in better values. On the other hand, a higher discount rate and, consequently, lower stand densities, resulted in greater species and size diversity, and larger albedo.

Ranking of the management plans

The rankings of the management plans depended on the performance metric (Tables 4 and 5). Especially noteworthy is the high sensitivity of the no-cutting plan to the metric that was used to rank the plans. The no-cutting plan was the best when the performance metric was computed from the reciprocal ranks. However, when the index was based on the multiplicative Cobb–Douglas utility function, the performance index of the no-cutting plan was always zero.

Table 4 Rank order of the management plans of the southern case study forest according to different performance indices (the best on top)
Table 5 Rank order of the management plans of the northern case study forest according to different performance indices (the best on top)

The rankings of the other plans were more systematic but not unambiguous. In the southern forest, the order of the silvicultural systems was CCF > AAF > MIX > CON, and the most common order of discount rates was 1% > 2% > 4%. However, the effect of the discount rate was not systematic, especially when the performance index was based on reciprocal ranks. Conifer-oriented rotation forestry ranked always poor.

In the northern case study forest, the mixed-stand-oriented rotation forest management was competitive with the CCF and AAF systems (Table 5). It was even the best when the discount rate was 1% and the performance index was based on compromise programming or unweighted Cobb–Douglas utility function. It also had the highest probability to be the best when the performance index was based on the Cobb–Douglas function and a large number of random sets of indicator weights (uSMAA1). Conifer-oriented rotation forestry ranked low, especially when the discount rate was high.

Table 6 Ranking of the management plans of the southern case study forest according to different performance indices when the normalization is based on the assumption that the lowest possible value of the indicator variable is zero (qkMin in Eqs. 6 and 7 is zero)

Sensitivity to normalization method

The values of the indicator variables were normalized based on the assumption that the lowest value among the 13 alternative management plans (exactly: 0.9 times the lowest value) is worthless and 1.1 times the highest value is ideal (Eqs. 5 and 6). The used normalizing method implies that the weights given to the indicators described the importance of changing the value of the indicator from its lowest level to the highest level. As a result, indicators for which the alternatives were almost equal may have been overemphasized in the analysis. Examples of these variables were berry and mushroom yields, and albedo.

To analyze the sensitivity of the ranking to the normalizing method, the analysis was repeated by assuming that the lowest possible value of the indicator is zero, i.e., the normalized variable measured distance from zero. Figure 4 shows that the management plans now differ much less than before (Fig. 2) in the normalized values of berry and mushroom yields, scenic beauty, carbon stock, carbon balance and albedo. The overall effect of changing the normalizing method was that the cutting plans were more competitive with the no-cutting plan in those indicators where the no-cutting plan was good.

Fig. 4
figure 4

Normalized values of the indicators variables when the lowest indicator value (qkMin in Eqs. 6 and 7) is zero when net present value is maximized with a 2% discount rate. Wind resistance is equal to one minus normalized standard deviation of the mean heights of stands

Despite these changes in the normalized values of indicators, the rankings of the management plans did not change much (Tables 6 and 7). The most systematic effect was that the ranking of the no-cutting plans was worse when the ranking method was multi-attribute acceptability analysis (MAA) or compromise programming (ComPro). Otherwise, the ranking was not particularly sensitive to the normalization method.

Table 7 Ranking of the management plans of the northern case study forest according to different performance indices when the normalization is based on the assumption that the lowest possible value of the indicator variable is zero (qkMin in Eqs. 6 and 7 is zero)

Discussion

The study presented an approach for ranking forest management plans based on their “social” or overall performance. The aim was to measure both the quantity and composition of ecosystem services. The term social efficiency was not used since the term is already reserved for a different purpose. Forestry is defined to be socially efficient when the marginal social benefit (private benefit plus external benefits) of increased production is equal to the marginal social cost (private cost plus external cost) of the additional production (Sloman and Garratt 2010).

The metrics could also have been called sustainability, acceptability, or responsibility indices. However, this was not done because sustainability, acceptability or responsibility do not guarantee efficient production. If sustainability means maintaining the current production level, it may not necessarily correspond to the most acceptable or responsible production (Lundmark et al. 2020). For example, if the carbon sequestration and carbon stocks of forests could be increased from their current levels, it may not be responsible or acceptable to refrain from increased carbon sequestration.

However, a good rank in the performance metric most probably reflects a high degree of sustainability, acceptability, and responsibility. This is because a balanced production of different ecosystem services was targeted, which is most probably important for acceptability. Resilience and biodiversity maintenance, which are the key requirements for long-term sustainability, were explicitly considered through several indicators. When calculating the indicators other than NPV, all periods were considered equally important since the benefits were not discounted. This also contributed to the long-term sustainability of the production.

The indicators were divided into five categories, which were economic, ecological, environmental, socio-cultural, and resilience. The classification as such is not important, and several alternative classifications could have been used (see e.g., Messier et al. 2019; Baskent 2020; Kappen et al. 2020). The main role of the classification was to provide a check that all relevant aspects of a sustained and responsible production of ecosystem services were considered.

The purpose was to propose a method that forest landowners can use to evaluate their forest management from society’s perspective, or in terms of responsibility and acceptability. The case study plans assumed that the forest owner maximizes the NPV of timber production. However, this does not need to be the case always, since the landowners may also have non-economic management objectives. Maximization of the metrics developed in this article could also be taken as the management objective, which would make sense especially in publicly owned forests.

The performance metrics can be used to check if the intended forest management is going to a desirable direction, or which silvicultural system or harvest level is the most desirable, or what is the loss in social benefit if private benefits are increased. These calculations may serve, besides forest landowners, also investors and policymakers. Policymakers may use the analyses for the planning of forestry subsidies and taxes or deciding about the legal limits of forest management. Another potential user group are investors who are seeking responsible investments (Serafeim 2020). Economically profitable forestry that sequestrates carbon from the atmosphere, maintains biological diversity and is socially acceptable might be the most wanted type of forest management for these investors (Kappen et al. 2020).

The results showed that the choice of the multi-criteria metric used as the performance index may sometimes be important but this conclusion also depends on the set of management alternatives that are compared. When selecting the set of the tested metrics, the purpose was to avoid situations where a good performance in one indicator compensates for a poor performance in another indicator. Because of this, the additive utility function was not used as a performance metric.

However, some of the selected metrics did not meet the requirement for a balanced production of ecosystem services. This was the case especially for the metrics that were based on reciprocal ranks. Although these metrics were not calculated from the quantities of the indicators, they allowed the compensation of a poor rank by a good rank in another indicator. This happened especially with the no-cutting plans, which did not produce timber for the forest industries (Irauschek et al. 2017; Mönkkönen et al. 2018). Since the performance metrics were targeted to managed commercial forests, an index where the no-cutting plan is the best cannot be regarded as acceptable. Therefore, it can be concluded that the performance metrics based on reciprocal ranks are not recommendable.

The most desirable performance metrics, when aiming at a balanced production of different ecosystem services, might be those based on the multiplicative Cobb–Douglas utility function. In this function, the exponent of the normalized indicator (Eq. 4) is the “importance” of the indicator. A low value of the exponent means that it is important to produce at least a small amount of the service whereas a large exponent means that large amounts are clearly better than small amounts. Whatever the weights of the indicators, the performance metric is equal to zero if any of the indicator variables equals zero. If there is no information on the indicator weights, equal weights or random weights with the SMAA approach can be used. The SMAA metrics measure the probability of being the best or among the best management scenarios under the assumption that the indicator weights may vary.

The case study results showed that the ranking of alternative plans may be sensitive to the normalizing method that converts the original indicators variables into relative values (Diaz-Balteiro et al. 2018). Two different normalization methods were compared. A third option would be to use single-objective optimization to find the lowest possible and highest possible value of each indicator. This would link the normalization method to the production possibilities of a particular forest. Another possibility is to use trade-off analyses and base the normalization on the rational range of each indicator (Diaz-Yáñez. et al. 2020). A further option is to consult experts to set the minimum acceptable and ideal levels of those indicators for which the benefit is not directly proportional to the value of the variable. This might be recommendable especially for the indicators of biodiversity (Gustafsson et al. 2012). For example, a certain minimum amount of large-sized deadwood may be necessary for the viability of saproxylic forest species (Tikkanen et al. 2007; Zubizarreta et al. 2019) but increasing the amount of deadwood to a very high level may no longer increase biodiversity.

The current study developed methods for ranking alternative plans. The methods cannot be used to evaluate the performance of a single forest management plans. For this type of analysis, the ranking method presented here should be further developed into a rating method. A rating method would require that the values of the normalized values of indicator variables do not depend on the set of the management plans that are compared. Most of the indicator variables used in this study are additive in the sense that their forest-level values can be calculated as the sum or mean of the stand values. The minimum and ideal values (or theoretical minimum and maximum) of the indicator can be defined for different stand types and latitudes, which would make it possible to calculate the normalized indicators first for each stand. The forest level metric is then obtained as an area-weighted average on the stand values.

The case study calculations indicated that management systems where the amount of final felling is small (CCF and AAF) often have the best social performance. Rotation forest management that aims at mixed stands or increased presence of broadleaves was better than conifer-oriented forestry. The results agree with the previous studies that compare different management systems of Finnish forests (Pukkala 2017; Pukkala 2018a; Díaz-Yáñez 2019; Eyvindson et al. 2020). Several recent studies have shown that an increasing area of continuous cover management and other retention practices mitigates the negative effects of clear-cut forestry on biodiversity (e.g. Mönkkönen et al. 2018; Baskent 2020; Eggers et al. 2020).

The results also showed that a low discount rate leads to better social performance. However, the discount rate is not a free choice when landowners aim at maximizing the total economic benefit of their investments. The discount rate used in forest management should depend on the profit of alternative investments. Continuous cover management is less sensitive to the adverse effect of increasing discount rate on the level of ecosystem services (Pukkala 2017). Legal limits for clear-felling age and the lowest allowed stand density are the means that the society may use to prevent the detrimental effects of high discount rates on social benefits. Other means include subsidies and taxes.

The case study analyses do not necessarily indicate, which type of management is the best for society. This is because all the management plans were single-objective and all plans (except the no-cutting plan) maximized the economic benefit of the forest landowner. A recent study by Díaz-Yáñez et al. (2020) showed that multi-objective optimization might be more efficient than single-objective management in a simultaneous production of several ecosystem services (Langner et al. 2017; Felipe-Lucia et al. 2018; Pohjanmies et al. 2021). It has also been suggested that a mixture of different management systems, or a mixture on managed forest and set-aside areas, is required to balance between conflicting management objectives and ecosystem service indicators (Mönkkönen et al. 2011; Mönkkönen et al. 2014; Triviño et al. 2015; Pohjanmies et al. 2017; Díaz-Yáñez et al. 2019; Baskent 2020; Eggers et al. 2020). Therefore, future studies are needed to find the forest management that maximizes the social benefits of forests. These studies may also calculate the losses in private benefits at various levels of social benefits. Moreover, the influence of the instruments that could be used to direct forest management towards good social performance should be addressed in future studies.