10.1 Introduction

It has long been known that the distribution of plant species in space depends on site factors. Humboldt and Bonpland (1807) described the dependence of plant species on altitude in tropical countries, i.e. across a temperature gradient . The year 2013 was the centenary anniversary of the publication of Braun-Blanquet’s first paper in the field of vegetation science (Braun-Blanquet 1913), in which he asserted that chemical properties of soils, specifically acidity and content of nutrients, are the prime factors that determine the distribution patterns of plant species. Based on these findings, Ellenberg (1950) developed a system to indicate soil conditions on agricultural fields based on the weed species growing between the crops. He assigned ordinal values ranging from 1 (little) to 5 (much) for weed species for five main site factor groups: soil acidity (pH), nitrogen (N) availability, water supply, temperature and continentality. The arithmetic mean of the indicator values for all weed species at a given place was used as an indication of site condition.

Later, Ellenberg extended the concept to all Central European vegetation types and plant species, respectively, and based the system on a 9-point scale (Ellenberg 1974). This concept of indicator values has been improved in recent years (Ellenberg et al. 2001, 2003) and has been adapted for other countries, such as Switzerland (Landolt 1977; Landolt et al. 2010 ), Great Britain (Hill et al. 1999), Hungary (Zólyomi et al. 1967; Soó 1980; Borhidi 1995) and Poland (Zarzycki et al. 2002).

Today, indicator values remain a very popular tool for the integrated assessment of site conditions. For example, Persson (1981) used indictor values to support the interpretation of ordination diagrams.

Without detracting the credit these works deserve, it should be noted that the underlying concept suffers from several deficits:

  1. 1.

    The assignment of indicator values to plant species is primarily based on the personal experience of the investigators. They “reflect the opinion of a (few) experts” (Wildi 2016) rather than being a result of the analysis of measured habitat conditions. Direct measurements have been used only marginally to define the indicator values, as systematic measurements of site conditions in phytosociological relevés were very rare. The comparison of mean indicator values with measurements of certain factors has been applied a posteriori only by way of example (Ellenberg et al. 2001).

  2. 2.

    The calculation of arithmetic means of indicator values is problematic, as ordinal variables should not be averaged. Möller (1992) suggests the use of medians instead of arithmetic means. Averaging the indicator values of the species occurring on a given plot is permissible only under the (implicit) auxiliary hypothesis that Ellenberg’s indicator value classes have (at least approximately) equal class widths.

  3. 3.

    Furthermore, there is debate as to whether either averages weighted or unweighted by cover should be used (Rodenkirchen 1982). The idea is that a species is to be expected to have a higher cover value when growing close to its ecological optimum . But cover also has a strong species-specific component: some species—orchids, for example—never occur at high cover, while others often do. The difference between averages weighted and unweighted by cover, however, depends on the evenness of the vegetation of the plot: for plots with high evenness, the difference is small because all species have more or less the same weight. Relevant differences are expected only in plots with a pronounced dominance structure. However, using cover as a weight for a weighted average is problematic in any case: measurement of the specific response of a certain species to the site factor under discussion would be necessary to determine the width of the ecological amplitude (Peppler-Lisbach 2008). Species with very narrow ecological amplitude should receive a higher weight than species that occur over a wider range of ecological conditions. But such a measure is completely lacking in all indicator systems to date. The cover-weighted average is only a substitute for this shortcoming.

  4. 4.

    Mean indicator values have also been used for environmental monitoring . Ewald et al. (2013) showed an increase in mean N values in forests in Central Europe over a period of nearly one century. Ewald and Ziche (2017), however, showed that Ellenberg’s mean N values are positively correlated with several nutrients but not with N concentration in topsoil . So the question arises: which measurable environment factor caused the increase of the N values?

Several attempts have already made to overcome these shortcomings: a posteriori calibration tries to justify the use of indicator values (Ellenberg et al. 2001). In most cases, the expected correlation is significant even though there is a low coefficient of determination. In some cases, however, the sign of the regression coefficient is even opposite to expected (Wamelink et al. 2002).

Hill et al. (1999), for example, recalibrated Ellenberg’s indicator values based on the national database of British vegetation samples. Ewald and Ziche (2017) analysed which physical units correspond to the mean indicator value for nutrients.

Nevertheless, none of these approaches addresses the major concern: the indicator system should indicate measurable properties of the soil (and other habitat variables) directly. Fischer (1986) used ordination methods to predict soil properties (pH) directly from vegetation relevés. However, prior to the second National Forest Soil Inventory (NFSI II) in Germany, data required to follow this approach was limited to small datasets covering only small spatial ranges and few vegetation types.

The goal of this paper is to create a species-based indicator system to assess soil properties . The indicator system is based on intensive soil analyses and vegetation surveys that yielded the German NFSI II data.

10.2 Climate, Soil, and Vegetation Data

The NFSI II recorded soil data as well as vegetation relevés on a regular grid of 8 km grid size at 1838 sampling points (BMELV 2006; Wellbrock et al. 2016). Sampling was conducted between 2006 and 2008. The vegetation was sampled on an area of 400 m2 within a 30 m circle around the central soil pits of the plots. On the NFSI plots, 819 vascular plant species were recorded overall. 425 plant species occurred on more than 4 plots. Nomenclature of species follows Jansen and Dengler (2008).

Soil variables that were accounted for in our work included pH (H2O), pH (KCl), pH(CaCl2), cation exchange capacity and the stocks of total nitrogen, total phosphorous (aqua regia extracts), organic and inorganic carbon and the cations Ca , Mg , K, Na, Al, Fe , Mn and H. Furthermore, the available field capacity was estimated based on texture analyses. Details on analytical methods can be found in HFA (GAFA 2014). Initial studies have shown that correlation between vegetation and soil is closest for the top 10 cm of soil (Ziche et al. 2016). Consequently, the first two depths (0–5 and 5–10 cm) were used for further analyses. Additionally, climate information was attributed to the NFSI plots using geostatistical methods such as ordinary kriging (precipitation ) or regression kriging (temperature, Ziche and Seidling 2010). Complete datasets with all recognized variables were available for 1619 sampling plots.

Within the framework of the intensive (Level II) monitoring of forest ecosystems in Germany,Footnote 1 another independent dataset was compiled that comprises soil data as well as vegetation relevés (Seidling 2005). This second dataset with 48 sampling points was used to evaluate our indicator system. All soil analyses were performed after 2005 and at depth intervals of 0–5, 5–10, 10–20, 20–40 and 40–80 cm.

Correspondence analysis (CA) using the programme CANOCO was employed to identify outliers (ter Braak and Šmilauer 2002). Species’ cover values were transformed according to Pudlatz (1975) to compensate the extremely skewed distribution of cover values . Rare species with a frequency of less than 5% were omitted from the analysis because they do not contribute to the general floristic structure of the vegetation. Only 20% of the omitted species have their main occurrence in forests (Schmidt et al. 2011). Forty-five percent are regarded as species of forests as well as open land. They are less typical for forests. Thirty-four percent of the species are not mentioned at all in the list of Schmidt et al. (2011). These are either typical species of open habitats that were geminated in the forest only by chance or taxa that could not be determined to species level (e.g. Quercus sp.). The CA indicates that no plots with outlying species composition were present. After omitting rare species, one relevé contained no other species and hence was omitted.

10.3 Environmental Impact on Species Composition

With the remaining 1618 relevés, forward selection of environmental variables using a Monte Carlo significance test with a canonical correspondence analysis (CCA) was applied to select soil variables that most affect the species composition of the vegetation . Forward selection of environmental variables in CCA is a stepwise regression approach: a variable is selected only if it contributes additionally to explaining species pattern compared to the variables already selected. The variable that explained the most and hence was selected first is base saturation (BS). C/N ratio and pH also show high predictive power. All variables appeared to significantly affect vegetation structure. This may be due to the large sample size. Moreover, significant relationships may not be relevant. For the purpose of indicators, the focus should be on the variables with the highest F value (Table 10.1).

Table 10.1 Ranking of environmental variables according to forward selection in CANOCO

Organic carbon is very highly correlated with total N. It was excluded from the analysis to avoid problems with collinearity.

The new indicator system is described below using base saturation as an example. Nevertheless, the model was developed for all variables listed in Table 10.1. However, the best results are expected for the first few variables in the list.

Finally, CCA was performed to identify the major gradients in vegetation . A biplot of the CCA with relevés and soil factors as environmental variables (Fig. 10.1) revealed a soil acidity gradient along the first axis with base saturation (BS) and pH increasing to the left and C/N ratio (CN) to the right. The second axis displays a climatic gradient with a positive correlation with altitude above sea level (Alti) and precipitation (Prec) and a negative correlation with temperature (Temp). The floristic structure of Germany’s forests is primarily determined by a soil acidity gradient and secondarily by a temperature gradient . Hence, soil properties associated with acidity (base saturation, pH, C/N ratio) can be expected to be good indicated in a species-based indicator system. The significant effect of temperature on forest vegetation demonstrates the high sensitivity of forests to the expected climate changes.

Fig. 10.1
figure 1

Biplot of the CCA with relevés (blue dots) and site factors

10.4 Modelling Species Response to Soil Properties

We fitted generalized linear models (GLM), applying second-order polynomial logistic regression (McCullagh and Nelder 1989). Soil variables serve as independent variables to the presence/absence data of the species, the dependent variable. These models describe the conditional probability of the occurrence of the species depending on the soil conditions:

$$ \log \left(\frac{p\left(V|x\right)}{1-p\left(V|x\right)}\right)={b}_0+{b}_1x+{b}_2{x}^2 $$

where p(V|x) is the conditional probability for finding species V at soil condition x and b 0, b 1 and b 2 are the coefficients of the regression equation. This approach is equivalent to fitting a symmetric, unimodal response curve to the occurrences of the species (Jongman et al. 1995). This is a parametric approach. Parametric methods have the advantage that the species response curve can be described using terms such as optimum and tolerance (Jongman et al. 1995). These values for base saturation are given in the appendix.

Unimodal response curves are the underlying principle of the commonly used CCA (ter Braak 1987; ter Braak and Šmilauer 2002). However, it remains unclear whether the unimodal model is appropriate to describe species response curves (Austin and Meyers 1996). While some authors state that it is “unlikely ecologically” that species have a multimodal response to a single variable (Roberts n.d.), others favour bimodal and skewed response curves (Zelený and Tichý n.d.). Therefore, we also applied generalized additive models (GAM), which allow any kind of response curve, to test for the occurrence of species with bimodal or skewed response curves. For this, we compared 95% confidence bands of the GLM with the GAM curves. If the GAM curve lies within the confidence band of the GLM, the latter is obviously an adequate description of the species response, and there is no reason to consider the more complicated GAM. Fitting of GLM and GAM was performed using the software package R (R Core Team 2016).

In our dataset, most species fit very well to the unimodal model. This was expected as the unimodal model is the basis for the widely used correspondence analysis. If a majority of species violates the basic assumption of this method, it would not perform as well as it does. Frequent and successful application of this method also suggests that the unimodal model is appropriate for most species. We were able to show that bimodal response curves are the exception rather than the rule.

Figure 10.2 shows some examples: Circaea lutetiana is a species with a clear, bell-shaped distribution within the base saturation gradient with an ecological optimum at 0.7, whereas beech (Fagus sylvatica) is an example of a species with wide ecological amplitude with respect to base saturation. Deschampsia flexuosa and Mercurialis perennis are species clearly preferring the lower or upper end of the gradient, respectively, with an optimum at the lower or upper end of the range, respectively.

Fig. 10.2
figure 2

Examples of species response curves with respect to base saturation (BS). The tics at probability 0 and 1 (bottom, top) show the location of relevés with and without the particular species, respectively. Circaea lutetiana has an optimum around 0.7 (70%), Fagus sylvatica is an indifferent species, Deschampsia flexuosa has a clear preference for low base saturations and Mercurialis perennis has a preference for high base saturation. Acer pseudoplatanus is a species with a bimodal response curve with respect to base saturation: there is one local maximum around 0.3 and another one 1.0. Picea abies is an example for a species with U-shaped response curve

Nevertheless, some species cannot be described adequately with a unimodal response curve. One example is Acer pseudoplatanus (Fig. 10.2e), which has one optimum around 30% base saturation and a second one at 100%. The GAM of this species lies outside the 95% confidence bands of the GLM. Such species cannot serve as indicator species in any indicator system as they do not have a clear optimum. Therefore, bimodal species were excluded from our indicator system. Fortunately, these cases are very rare, and most species in the dataset fit to the unimodal model.

In a few cases, the response curve was U-shaped instead of bell-shaped. This is the case if the coefficient b 2 in the logistic regression is significantly greater than 0. In fact, this is a special case of a bimodal response curve. Such species also do not qualify as indicator species and were excluded from the indicator system. An example is Picea abies (Fig. 10.2f).

10.5 Predicting Soil Properties by Species Composition

In the next step, we needed to derive the conditional probabilities of different values of the soil factor to be indicated \( p\left(x|\overrightarrow{V}\right) \) depending on the species combination \( \overrightarrow{V} \) found in the forest stand. This is performed by means of the Bayes’ formula (Fischer 1990, 1994):

$$ p\left(x|\overrightarrow{V}\right)=\frac{p\left(\overrightarrow{V}|x\right)\bullet p(x)}{p\left(\overrightarrow{V}\right)} $$

The multivariate conditional probability of finding the species combination \( \overrightarrow{V} \) at soil condition x is estimated from the univariate response curves assuming independence of the effects:

$$ p\left(\overrightarrow{V}|x\right)=p\left({V}_1|x\right)\bullet p\left({V}_2|x\right)\bullet \dots \bullet p\left({V}_n|x\right) $$

The prior probabilities of the soil conditions p(x) correspond to the frequency distribution of x in the area of investigation. Kernel density estimation with R function “density” was used to determine p(x). The prior probability of the species composition \( p\left(\overrightarrow{V}\right) \) is a constant at a given point in the landscape and hence does not need to be considered. The value of the soil condition x with the highest conditional probability \( p\left(x|\overrightarrow{V}\right) \) is then taken as the result of the indication.

The results of the application of the indicator model are curves describing the probability of different values of the soil factor on certain relevés based on the species growing at those points . Four examples from different regions in Germany are shown in Fig. 10.3 for the variable base saturation. In most cases, these profiles display a clear peak. The base saturation with the highest probability is the indicated base saturation value. Only in very few cases is the profile a flat curve indicating uncertain indication. An example is given in Fig. 10.3d.

Fig. 10.3
figure 3

Examples of probability profiles for relevés. The maxima of the curves correspond to the indicated base saturation (n, number of indicator species ). Relevé 100037 is an example of a flat profile with a bimodal distribution indicating uncertain indication

Comparison of the indicated and the measured base saturation in the independent reference dataset yields a quite high coefficient of determination of R 2 = 0.6 (Fig. 10.4). The second important variable according to forward selection of environmental variables (Table 10.1) is the ratio of organic carbon and total N (C/N ratio). With an R 2 of 0.38, nearly 40% of the variability of the C/N ratio can be predicted with the indicator system. With pH, the third variable in the ranking of the forward selection of environmental variables (Table 10.1), the coefficient of determination (R 2) was 0.40.

Fig. 10.4
figure 4

Regression between measured and indicated base saturation in the reference dataset

10.6 The WeiWIS Indicator System

We implemented the algorithm described above in a MS-ACCESS application called “Weihenstephan Wood Indicator System” or WeiWIS. WeiWIS combines vegetation relevés with a species’ calibration data and generates indicated values for habitat factors of all relevés and habitat factor profiles for selected plots (Fig. 10.5).

Fig. 10.5
figure 5

Structure of the WeiWIS tool. Users can input their own vegetation relevés. They are analysed with the species calibration data, and indicated values for all relevés and habitat factor profiles are generated

With WeiWIS as a new indicator tool, many unresolved problems with existing indicator systems become obsolete. The question as to whether the arithmetic mean or the median should be used to characterize a site does not apply to our approach, because the response curves themselves are considered, not only the maxima. This is also the reason why the question as to whether weighted or unweighted means leads to better result is not relevant here. In fact, species are weighted implicitly with the width of the ecological amplitude: species with a narrow and high peak have a higher weight in the system compared to species with a wider, flatter curve.

Furthermore, the resulting probability profile for each relevé gives an indication of the reliability of the result. A wide and flat profile suggests a less reliable result, probably due to a smaller number of species or a species combination that includes contradictory species.

10.7 Discussion

Although the application of indicator values is very popular, comparisons with measured values are quite rare in the literature. Michler (1994) found significant correlations between measured soil variables and mean indicator values with an R 2 of up to 0.2 (pH with mean R value, K+, Zn2+ and phosphate with mean N value). Total N was not significantly correlated with the N value. Ewald and Ziche (2017) found base saturation was the best variable to predict Ellenberg’s indicator value for nutrients (mN). They found a correlation of −0.55 between C/N ratio and mN, corresponding to an R 2 of 0.30.

Our indicator system, however, shows a significant correlation between measured and indicated base saturation, with R 2 = 0.6. This is a remarkable improvement compared to Ewald and Ziche (2017), especially since they used the same BZE II dataset as our study.

A major advantage of our indicator system is that it indicates soil properties (such as base saturation or C/N ratio ) directly and not via a dimensionless value (mean indicator value) that needs to be correlated a posteriori with the desired measurable parameter.

Several attempts have already been made to define indicator values for plant species on the basis of measurements. Wamelink et al. (2005) used the mean of pH values for all relevés where a species is present as an indicator value. However, this concept is biased by the extent of the range of the observed gradient. If the calibration dataset does not cover the entire range of habitat conditions that a species can occupy, the mean cannot correctly reflect the species’ optimum. Furthermore, the mean is influenced by the frequency distribution of the habitat factor. If some habitat conditions occur more frequently in the calibration dataset than others, these conditions will receive a higher weight for the definition of the indicator values. Gégout et al. (2003) overcame this problem by using GLM to define the ecological optimum . The value of the habitat factor with the highest probability is taken as the indicator value for the species. However, both approaches have their indicator based on the arithmetic mean of the indicator values of the species present in a plot.

A very different approach is the imputation method of Tichý et al. (2010). Here, the vegetation relevé for which environmental variables should be indicated is compared with the relevés of a reference dataset for which environmental measurements are available. The mean of the values of the measured environmental variables of floristically similar relevés in the reference dataset is used for indication. This approach requires a huge reference dataset where all vegetation types are represented with statically representative sample sizes. Otherwise, indicator values cannot be identified for all relevés. In contrast to our approach, Tichý’s method produces only one indicator value but provides no measure of the reliability of the value. With his recommended parameter (maximum distance = 0.5), only 22% of our relevés could be assigned a value, showing that our reference dataset with 1619 relevés is still too small for Tichý’s approach. With a maximum distance of 1, all relevés were assigned. But the variance of the estimated values was significantly smaller (p = 0.09) than the measured ones. Consequently, the imputed values are clearly biased.

Peppler-Lisbach (2008) based his indicator concept on niche overlap: the indicated value is the centre of the overlap of the niches of all species in a plot. Obviously, this concept can only be applied if all species in a plot have overlapping niches.

10.8 Conclusions

To date, all species-based systems to indicate site conditions, such as the famous Ellenberg indicator values, reflect “expert opinions only” (Wildi 2016), rather than an outcome of sound statistical analysis of measured data. With WeiWIS, we have developed an indicator system for soil properties that is based on a large, statistically representative sample of measurements. We have undertaken the step away from systems developed during the last half century that are primarily subjective towards a quantitative system that is based on a systematic sampling design and which follows a sound statistical approach.

Species-based indicator systems that indicate complex factors, like Ellenberg’s system, should not be a substitute for ecological measurements in environmental monitoring (Michler 1994), because changes, for example, in the mean N value, cannot be attributed to a concrete, measurable variable. However, this step is necessary to set up effective, sustainable planning in the management of habitats and to avoid misinterpretations. Our system avoids these problems because it directly indicates measurable values. The method can be applied to all environmental variables that effect species distribution.

Moreover, our indicator systems can help to estimate the conditions in the past, for which no measurements are available, and then compare them to the current situation within the same system.