1 Introduction

Based on changes in plant phenology, which deals with the timing of various recurring phases of plants, conclusions about the impacts of climate variations and change can be drawn. In recent years, scientific attention towards determining changes in and forecasting the development of phenology has grown (Fu et al. 2014). Numerous analyses of phenological data have demonstrated differences in the climate sensitivity of species all across the planet (e.g., Menzel et al. 2006; Richardson et al. 2013; Thackeray et al. 2016). Learning more about the complex interactions between vegetation and meteorology is of relevance, e.g., for creating high-quality earth system models (ESM). Due to various feedback mechanisms (e.g., albedo, evaporation, …) between biosphere and atmosphere, a good representation of vegetation is a necessity for future climate change simulations (Richardson et al. 2013). So far, dynamic modeling of vegetation in ESMs is not the standard.

Previous studies dealing with the modeling of phenology focused on developing empirical models using mainly air temperature as predictor. To investigate the relationship between the onset of a phenological phase and temperature, the average or sum of the measured temperature over a fixed time period, called ‘Growing Degree Days’ (GDD), is calculated. Many models employ a temperature threshold, typically excluding values below 0 °C, as criterion for selecting the daily temperature beyond that threshold over a chosen interval (Chmielewski et al. 2005; McMaster 2005; Črepinšek et al. 2006). Most studies consider an accumulation interval from a fixed date to the day of the year (DOY) when the particular phenophase occurs for the first time in a given year or choose the month(s) before or in which the DOY occurs on average (Chuine 2000). Using more months or larger intervals tends to result in better models (e.g., Kolodziej and Frühauf 2008; Scheffler and Frühauf 2012; Jochner et al. 2013).

In addition, there are so called process-based models that are assumed to provide more realistic predictions of phenological characteristics due to their consideration of plant-physiological processes. These models consider, e.g., chilling hours (CH) or chilling units (CU) because temperate plants often require a species-specific number of days below a critical temperature before dormancy is terminated and growth can begin (Chuine et al. 1999; Islam et al. 2016). Despite their higher complexity, process-based phenological models are still outperformed by simple empirical models (Basler 2016). For example, Linkosalo et al. (2006) compared four models for different tree species: the more complex models with more than one climate predictor were outperformed for all test plants by a simple temperature accumulation model, in which all temperatures beyond a certain threshold are summed up. However, in the future, such two- or more-phase models (that consider GDD and CH) are expected to provide better predictions concerning phenology, because current and particularly future winter temperature does not always fulfill the coldness requirements every year, as it is assumed for pure GDD or simple empirical models (Chuine et al. 2016).

In addition to different model approaches, there are also varying species responses to temperature changes at different rates and some species show different responses at different sites (Primack et al. 2009). An overview of the spatial and temporal variability of phenology in Germany can be found in Menzel et al. (2001). A part of the spatial variability can be attributed to the relationship between altitude and phenology. However, it was found that the gradient (DOY change/100 m) varies noticeably over different regions, elevation levels, and plant phenophases (e.g., Rötzer and Chmielewski 2001; Dittmar and Elling 2006; Larcher 2007; Schleip et al. 2009; Ziello et al. 2009; Cornelius et al. 2013 and articles cited therein). The same is true for different phenological phases during the vegetation period: spring phases, which mostly depend on temperature, can be assessed quite well, while the modeling of autumn phases is less successful. One possible explanation for the unsatisfactory representation of autumn phases is that they also depend on water supply and soil properties that are not taken into account in most model approaches (Chmielewski and Rötzer 2001; Estrella and Menzel 2006).

This study deals with the relationship between phenology and several meteorological variables in Bavaria, Germany. Its main aim is to develop a reliable multivariate statistical model for the prediction of the onset of phenophases. As described above, there are several approaches for selecting and preparing meteorological data. Most of them require additional information like plant-specific thresholds, CUs, gradients or elevation to model phenology. We aim to find a powerful method that can be applied without additional information for specific plants, that does not require large data sets, and that fits local phenological characteristics as good as possible. The focus of our study is on the comparison of various methods for selecting and preprocessing the climate predictors in the light of their regional performance and limitation. In contrast to the widespread temperature-based models, we investigate whether the inclusion of additional meteorological parameters can improve the modeling results. A second objective is to predict future phenological characteristics in Bavaria under climate change conditions as given by high-resolution regional climate model projections.

The next sections describe the considered data and methods, especially the modeling approaches to be compared with each other. Section 3 is dedicated to the results. The discussion of our results is undertaken in Sect. 4. The main conclusions are drawn in Sect. 5.

2 Data and methods

2.1 Data

We use two observational datasets provided by the German Weather Service (DWD) covering the period 1950–2013: (1) phenological data from 1154 Bavarian stations, collected from volunteers for various plants and phenological phases, indicating the DOY, on which the plant reaches the phase for the first time. The volunteers get a guide with descriptions of the phases and are instructed to watch multiple plants in the stations’ surroundings to keep the monitoring standardized. The quality of this data set is verified by the DWD using absolute limit testing and a spatial consistency test (Kaspar et al. 2015). This data set is widely used for phenological analysis (e.g., Menzel et al. 2001). (2) Meteorological data from 225 stations across southeastern Germany, which provide up to 13 climatic variables in daily resolution (see Table 1). Stations that do not provide all necessary variables were excluded from the multivariate modeling approach.

Table 1 Correlation coefficients between input variables and the four leading principal components as well as their eigenvalues and explained variance. Coefficients above (below) 0.5 (− 0.5) are italicized

To study phenology changes until 2100, we use daily meteorological variables from the state-of-the-art RCM REMO2009 (REMO), which is driven by observed historical greenhouse gas concentration in period 1950–2005 and afterwards by two representative concentration pathway (RCP) scenarios, i.e., RCP 4.5, and RCP 8.5 assuming a CO2 equivalent of 630 ppm and 1313 ppm, respectively, until 2100. For a multi-model ensemble of global climate models, this results in a global mean warming of about 1.8 °C (3.7 °C), compared to the average global surface temperature of 1986–2005 (Jacob et al. 2001; Moss et al. 2008; IPCC 2013; Jacob et al. 2014).

2.2 Data preparation

Figure 1 shows the complete workflow from data preparation, building local anomalies to address spatial and temporal autocorrelation, choosing an accumulation interval, choosing a start day for the accumulation, selecting suitable meteorological variables, and trying some regionalization for optimizing the resulting model and calculate phenological onsets in the future with two climate model scenarios. The following sections describe each step in more details. The figures and tables cited in Fig. 1 show the respective result of this step.

Fig. 1
figure 1

Workflow of the complete analysis from data preparation with connecting both data sets spatially and building anomalies, choosing an accumulation interval and getting an approximation of the start value of the accumulation interval, selecting the suitable variables and trying different regionalization to get the resulting model

The locations of the phenological and meteorological stations are not congruent. Therefore, we connect each phenological station with its nearest neighbor (NN) meteorological station by considering two specific prerequisites: first, the maximum distance between the meteorological and phenological station is limited to 20 km and, second, more than 30 years of data for pairs of meteorological and phenological station data are available. The maximum distance was chosen as a compromise to ensure that still enough pairs of stations are available for analyses while the climatic conditions at the stations are sufficiently similar.

Figure 2 a illustrates the mean flowering date for our main test plant Forsythia suspensa (forsythia) over the test period with the location of the phenological and their assigned meteorological stations. Note that Forsythia suspensa is a wild plant that exhibits large geographical variations in terms of flowering. We focus on forsythia as it has a good data coverage and represents the indicator plant of the phenological phase ‘Erstfrühling‘(early spring) as defined by Deutscher Wetterdienst (2016). Forsythia is also known to be sensitive to temperature. Figure 2 a also demonstrates the uneven distribution of the mean DOY of forsythia flowering, mainly attributed to topography in the study region. In the elevated northeastern and southern parts of Bavaria flowering starts much later than in other regions and early flowering primarily takes place in the lower parts of northwestern Bavaria. Although the yearly values from station to station as well as the range from year to year are subject to strong fluctuations, there is a significant negative trend in the mean annual onset of flowering during the 1950–2013 period. On average, forsythia has bloomed about 1.8 days earlier per decade since 1950, i.e., more than 10 days in total (Fig. 2b).

Fig. 2
figure 2

Interpolated (IDW) long-term mean onset of flowering of forsythia for 1950–2013 with the location of the used meteorological and phenological stations in Bavaria and their nearest neighbor relationship (a), spatial-mean time series and range over all Bavarian stations including the number of observations per year (b)

To eliminate the given topographic effects between each meteorological and phenological station pair as well as among phenological stations, we transform all data sets into anomalies. Daily meteorological data are converted by subtracting the long-term daily mean of each station from the original value. The annual phenological data are converted by subtracting the long-term mean DOY of each station from its annual DOYs. We consider all available data for each station. When relating phenology to climate by means of our statistical model, at least 30 data pairs are required per station and the mean values used for constructing the anomalies are derived from the same years available for climate predictors and phenological data. These anomalies are robust to topographic effects (Pollinger et al. 2017). The time scale of the meteorological variables is adjusted to the annual phenological data by means of accumulation over various intervals, testing three different approaches: the first method uses the observed DOY of each phenological station and year as last day of the selected accumulation interval for the climate predictors. This approach is the most data-adaptive one and reflects the genuine statistical relationships between climate and phenology, but it cannot be applied to future periods for which phenology is unknown. However, it can serve as a reference for the two approximations considered here. To approximate regionally adapted predictors, we define the mean DOY of each station to be the endpoint of the accumulation interval for any future period (second method). For the third method, the mean value of the plant-phase-combination (PPC) from all stations and years is used, suppressing regional differences. The length of the accumulation intervals is varied in 15-day steps between 15 and 120 days prior to the phenological DOY.

2.3 Methods

To explore and quantify the statistical relationship between inter-annual variations in DOY and meteorological variables, classical correlation analysis is employed as a first-step diagnosis. In addition, a cross-validated multiple regression model was tested to find the best statistical transfer function for prediction (cf. Paeth 2011; Awoye et al. 2017). As this kind of algorithms strongly depend on the sample and bare the risk of overfitting triggered by the multi-collinearity between the predictors, a principal component analysis (PCA) is carried out based on the correlation matrix of the climate predictors. Since the aim of creating a statistical model is to have high predictability by using as few predictors as possible, one representative original variable is identified for each of the leading principal components (PCs). Additionally, the reduced variable space ensures a larger number of meteorological stations to be used in the multiple regression model.

Besides the modeling approach using all available stations across Bavaria, we test the potential of several spatial classification methods in order to apply the model to homogeneous sub-regions. The underlying hypothesis is that statistical transfer functions fitted to individual stations or smaller sub-regions should outperform a generalized approach for the whole study area. Such homogeneous sub-regions are built by means of three alternative spatial classification methods:

  1. 1.

    A combined hierarchical cluster analysis with k-means regroupment approach (e.g., Paeth et al. 2005) is applied to the selected long-term mean climate predictors at each meteorological station (here: mean air temperature, precipitation, mean wind velocity, snow depth). The cluster analysis groups together those stations with lowest Euclidian distance from each other, leading to homogeneous sub-regions that distinctly differ from other sub-regions.

  2. 2.

    Meteorological stations are grouped to sub-regions according to elevation ranges.

  3. 3.

    Two existing and quite similar geographical classifications based on a set of natural landscapes occurring in Bavaria are used: One from the Bavarian State Institute for Environment (LfU) and one from the Bavarian Institute for Forestry Seed and Plant Breeding (ASP)).

To restrict the statistical model to relevant climate predictors for the considered phenological phases, a cross-validated multiple linear regression analysis is carried out. Out of the total sample, 200 randomly selected data pairs are excluded for the cross validation by means of bootstrapping. The remaining sample is used to train the regression model. This procedure is repeated over 1000 iterations, each time excluding a different random subsample. This leads to the mean characteristics and uncertainty range of the statistical model. The statistical model is first applied to the flowering date of forsythia and tested using the different model setups and data preprocessing described above. In total, the data of 106 forsythia stations could be taken into account. The trained regression model is then applied to future climate predictors from regional climate projections under two emissions scenarios in order to assess future changes in flowering dates. We also employ this method in terms of five other spring phases (flowering of Syringa vulgaris, Cornus mas, and Philadelphus coronarius as well as leaf sprouting and flowering of Robinia pseudoacacia).

3 Results

3.1 Bivariate correlation

Figure 3 illustrates the strength and direction of the statistical relationships between all considered climate predictors and the flowering date of forsythia. For each meteorological predictor and each accumulation interval prior to the flowering date, the figure shows one boxplot of correlation coefficients calculated for all phenological station across Bavaria. The negative correlations calculated for all temperature variables point out that warmer periods lead to an earlier flowering and colder periods to later flowering. The closest correlation is found between daily mean temperature (TM) anomalies and DOY. Mostly significant correlation coefficients are also found for the other thermal predictors and the shorter accumulation intervals of snow depth, whereas those of the non-thermal predictors do not exceed the thresholds of statistical significance (|r| = 0.35 for α = 5%) and even vary in sign. Among the different accumulation intervals, the predictors built over the 45-day period prior to DOY attain the strongest correlation. Therefore, this accumulation interval is chosen for the subsequent analyses.

Fig. 3
figure 3

Boxplots of correlation coefficients between the anomalies of onset of flowering of forsythia and the anomalies of climate predictors accumulated over various intervals. Thresholds of statistical significance are marked as horizontal lines

3.2 Multiple regression

When combining the various climate predictors in a multiple regression model, the multi-collinearity of the meteorological variables has to be taken into account in order to restrict the model to a minimum number of relevant predictors. The correlation coefficients among all predictors are depicted in Fig. 4. The inter-correlation is quite strong (|r| > 0.8) among all thermal variables and, e.g., between air pressure and cloudiness considering the 45-day accumulation interval. Thus, there is need to reduce the number of climate predictors. In the first step, this is achieved via cross-validation of the multiple regression model. In the second step, the predictor set is reduced by means of a principal component analysis.

Fig. 4
figure 4

Correlation matrix of the climate predictors accumulated over 45 days before the flowering date of forsythia. For acronyms of variables, refer to Table 1

In Fig. 5, we present results from multiple regression analyses using different numbers of climate predictors accumulated over a 45-day interval. The aim is to find a statistical transfer function for the flowering of forsythia with as few predictors as possible. In addition, three approaches for the definition of DOY are considered: measured DOY per station and year, long-term mean DOY per station, mean DOY over all stations and years (see Sect. 2.2). The adjusted coefficient of determination (R2) and the root mean square error (RMSE) are shown on the left side as a function of amount of admitted predictors. The right side presents the predictors selected as being the most important ones depending on the predefined total number of predictors. The final selection is marked by the minimum RMSE after cross-validation and represents a basic quality criterion of the statistical model, i.e., optimizing its performance with as few predictor as possible. Using the measured DOY (Fig. 5a), mean air temperature represents the leading predictor, explaining 60% of the total variance. Any additional predictor only slightly contributes to the explained variance. When all variables are included in the regression model, R2 is enhanced by no more than 2.3%. However, the RMSE improves from around 13 days based on one predictor to 8.6 days with all meteorological variables used in the model. The same performance, however, can be achieved using only seven predictors. Consequently, the predictors added afterward (e.g., vapor pressure, cloudiness and others) do not provide additional information to the statistical model. The method that uses the time- and site-specific measured DOY reaches a higher R2 (approx. 8%) and a slightly smaller RMSE for all numbers of predictors in the model than the methods that approximate flowering data (Fig. 5a–c). It can also be stated that the station-mean DOY approach (Fig. 5b) has a 2% higher R2 than the method losing the individual stations’ information (Fig. 5c). Thus, the method with most stringent adaptation to the data achieves the best performance, as to be expected.

Fig. 5
figure 5

Total explained variance and standard error of the multiple regression model for different admitted numbers of climate predictors accumulated over 45 days (left) and the leading predictors selected by each of these models with the rows referring to different upper limits of admitted predictors (right), as observed per individual station and year (top), using each stations’ mean DOY (middle), and using the mean DOY over Bavaria (bottom). For acronyms of variables, refer to Table 1

The development of quality criteria, when adding new meteorological variables to the model, is the same in all three DOY approaches. In terms of the selected predictor combination for each method, it becomes apparent that the selections per model step resemble each other (Fig. 5, right panels): mean air temperature is the most important predictor in the model. For the first and second method, the best model with two variables is the one comprising mean and minimal temperature, whereas for the third method, the combination of mean wind velocity and maximum temperature is selected. Figure 5 also indicates that if a meteorological variable is once chosen as a predictor, it persists when the number of predictors is increased, except for the model with eight meteorological variables that is more disruptive. In addition, almost all climate predictors are statistically significant. This demonstrates that the relationship between climate and phenology is quite robust with respect to different DOY approaches, different locations across Bavaria and different predictor sets. As the predictive skill of the statistical models does not differ substantially among the three definitions of the flowering date, the station-mean DOY approach is used for further analysis because it allows for the application to future periods.

To reduce the multicollinearity of the meteorological variables prior to the predictor selection by the cross-validated multiple regression model, a principal component analysis with all 13 input variables was performed. Table 1 shows the correlation coefficients between each input variable and the first four principal components (PCs). Coefficients above (below) 0.5 (− 0.5) are highlighted. The first PC mainly represents the thermal variables, all of which show very close correlations. The second one can be interpreted as an indicator of hygric variables, such as precipitation and relative humidity, while the third one stands for mean and maximum wind velocity and represents the atmospheric dynamics, as, e.g., mirrored by the North Atlantic Oscillation. For the fourth PC, only snow depth (SH) exceeds the threshold. In the first four PCs, each meteorological input variable exceeds the threshold only once, allowing for such a clear physical interpretation of the PCs. Considering their eigenvalues (threshold given by Jolliffe 1972) and explained variance, the leading four PCs account for 85.6% of the total variance given by the 13 meteorological input variables. Thus, the model dimension can be reduced from 13 to 4 predictors and still the main characteristics of Bavaria’s climate are retained. To build the four PCs, however, the data of all 13 input variables are still required. To retain more pairs of meteorological and phenological stations, a representative variable for each PC was selected. This leads to the following predictors for the best model approach: (1) mean air temperature, (2) precipitation amount, (3) mean wind velocity, and (4) snow depth (cf. Table 1). Precipitation was selected because data for cloudiness, sunshine duration, and relative humidity were not available for as many stations as for precipitation. In addition, there is no noticeable difference between the results derived from cloudiness and precipitation (not shown). The integration of snow depth as model predictor for forsythia is obvious because it affects the local surface radiation balance and serves as storage of moisture and heat on the ground. The effect of mean wind velocity is less apparent. The use of wind as a predictor for phenology can be justified by the effect of wind in terms of wind chill and evapotranspiration and as an expression of large-scale modes of variability like the North Atlantic Oscillation. In addition, mean wind velocity was also found to be one of the leading climate predictors in Fig. 5 (right panels). To approve this selection, we performed another model run without wind as predictor for forsythia and achieved an 11% lower R2 and a 1-day larger RMSE. Therefore, the final statistical model is based on all four predictors. Considering the whole study area of Bavaria, the multiple regression model with these four predictors accumulated over 45 days and using the station-mean DOY approach achieved an R2 of 51.9% and an RMSE of 9.4 days (Table 2, top row). Note that using the four leading PCs as predictors did not improve these values, although all 13 variables are included to build the PCs.

Table 2 RMSE and R2 for the Bavarian model (first row) and each class of the regional model. Classes containing less than 30 records are not taken into account in the multiple regression analysis

3.3 Regional models

In the next step, we attempt to enhance the model’s skill by subdividing the study area in order to tailor the model to specific landscapes, altitudinal belts and climate subzones. The tested regional models achieve a rather similar performance with the RMSE varying by no more than 0.11 days. As the geographical classification of natural landscapes from LfU performs slightly better than the other regional classifications, its RMSE is illustrated in Fig. 6b and compared with the model derived from all Bavarian stations (Fig. 6a). Considering geologic, morphologic, hydrologic and climatic criteria as well as land-use properties, 18 different natural landscapes are classified for Bavaria (Ssymank 1994; Bayerisches Landesamt für Umwelt 2016). For those classes containing more than 30 data points, RMSE and R2 are listed in Table 2. The table and map reveal that the quality criteria vary considerably from class to class. Class 17, i.e., a low mountain range in the southwest of Bavaria, has a relatively poor performance with R2 being below 36% and an RMSE of about 17 days. Classes 5 and 12 in the northwestern and northeastern parts of Bavaria exceed an R2 of 70%.

Fig. 6
figure 6

Interpolated (IDW) mean RMSE in days per station for the Bavarian model (a) and for the regional model based on the LfU’s landscape classification (landscape class numbers as in Table 2) (b). Note that the number of stations is smaller than in Fig. 1 due to the predictor selection criteria

The mean R2 and RMSE of all sub-regions are 58.3% and 9.9 days, respectively. Compared with the Bavarian model without regionalization, this implies that R2 is about 6% higher in the LfU classification approach, whereas the RMSE is about 0.5 days lower in the Bavarian model (Table 2). However, comparing the mean RMSE per station, the regional model has an RMSE average of 8.61 and the Bavarian model of 8.72 days per station. The main difference between the spatial RMSE patterns in Fig. 6 is located in the northeast and results from one missing station in the regional statistical model because there is insufficient data for this landscape class to derive the statistical transfer function. Figure 6 also shows that the interpolated station RMSE has two ‘hotspots’: in the southwest of Bavaria, a few stations have a mean error of more than 2 weeks. The station with the highest mean RMSE of 24 days, however, is located in the north of the study area near the city of Bamberg. Note that several stations nearby have prediction errors of less than 8 days. Apparently, prediction quality is strongly affected by the quality of individual station data but relatively independent of spatial patterns or regional classes.

After testing, analyzing and comparing all different statistical model approaches, we chose the one with the station-mean DOY, the accumulation interval of 45 days, the four predictors derived from PCA as mentioned above and based on the LfU landscape classification as the best tool and apply this model to all forsythia stations as well as to other indicator plants, leading to specific statistical transfer functions for each plant. The aim of applying the model to other plants is to test the hypothesis that our statistical model is sufficiently robust and universal to yield reliable results for other plants apart from the one the model is optimized for. Thus, a low R2 and high RMSE arising from this approach does not necessarily imply a missing link between climate and phenology, but may point to the fact that other climate predictors and accumulation intervals are relevant to the specific plant. The results for all considered indicator plants are summarized in Table 3. Besides average and standard deviation of the measured DOY for each PPC, Table 3 shows the mean R2 and RMSE per station. Out of the six considered PPCs, flowering of forsythia achieves the highest R2. Syringa vulgaris and Cornus mas, whose mean flowering date resembles the one from forsythia, follow with an R2 of 41.5% and 39.2%, respectively. In contrast, the selected four climate predictors barely affect the flowering of Philadelphus coronarius and the flowering and leaf sprouting of Robinia pseudoacacia. Regarding the mean RMSE, the value for Syringa vulgaris with 6.6 days and for Robinia pseudoacacia with 7.8 days is even better than for forsythia. It is important to note that all prediction errors are substantially smaller than the observed year-to-year standard deviation of DOY. This implies that the multiple regression model provides some gain of information for every considered plant’s DOY. Due to the relatively low R2, the PPCs of Philadelphus coronarius and Robinia pseudoacacia are not considered for the subsequent analysis.

Table 3 Comparison of the RMSE and the R2 for different indicator plants and phenological phases, using the best statistical model approach (see text). The number of stations refers to the phenological stations that fulfill the selection criteria described in the text

3.4 Future changes in phenology

In the last step, the future development of the flowering dates of forsythia, Syringa vulgaris and Cornus mas are assessed until the end of the twenty-first century based on meteorological predictors taken from REMO RCP 4.5 and RCP 8.5 data. The observed and simulated time series of the spatial mean DOY of forsythia flowering show remarkable similarities with respect to the mean, variability, and trend until 2013 and the negative trend of the flowering date goes on until the year 2100 (Fig. 7). Compared with the mean onset of flowering in the reference period (101st DOY), flowering will start substantially earlier under the RCP 4.5 and RCP 8.5 scenarios, 23 days on average during the 2071–2100 period in the business-as-usual scenario. In Fig. 8a, the differences between consecutive future 30-year mean flowering dates of forsythia and the reference period are mapped. In both scenarios, flowering begins continuously earlier everywhere in Bavaria when greenhouse gases increase over the twenty-first century. The few positive values in North-Eastern Bavaria during the first 30-year period under RCP 4.5 scenario result from the occasionally wetter and cooler years in this area simulated by the climate model. However, going further into the future the model output becomes nearly a function of temperature because rapidly rising temperature overcompensates any potentially counteracting effects. The RCP 8.5 scenario reaches a flowering state exactly 30-year period earlier than the RCP 4.5 scenario. The spatial differences of future flowering changes in different regions across Bavaria is striking. Strongest changes are found for northwestern Bavaria. Here, forsythia is predicted to flower up to 25 days earlier compared with the reference period 1961–1990.

Fig. 7
figure 7

Observed and simulated time series of the spatial-mean flowering date of forsythia in Bavaria from 1950 to 2100

Fig. 8
figure 8

Differences between future 30-year means of flowering date under RCP4.5 (left) and RCP8.5 (right) scenario and the reference period 1961–1990 for forsythia (a) and for Syringa vulgaris and Cornus mas (b)

The extent and spatial characteristics of future phenological changes also depend on the considered PPC (Fig. 8b). While the temporal shift of Syringa vulgaris is lower, ranging from 5 to a maximum of 15 days, Cornus mas resembles the climate change signal of forsythia with 10 to more than 20 days earlier. Regarding the most affected regions, the southwestern part of Bavaria and isolated stations in the southernmost part along the northern Alps must be mentioned for Syringa vulgaris and the upper middle part of Bavaria (Middle Franconia) for Cornus mas.

4 Discussion

The present study is dedicated to climate-driven phenological changes in Bavaria, southern Germany. Among various PPCs available on behalf of the observational data, we mainly focus on the flowering of forsythia because of its well-known link to temperature. In a study by Henniges and Chmielewski (2006), forsythia flourished about 2 days earlier in the urban area of Berlin than in the surrounding rural area. Franken (1955) and Bernhofer (1991) found even greater temporal differences between urban and non-urban areas around other cities. This also holds for other plants and phenological phases (Rötzer et al. 2000; Jochner et al. 2012). The effect of earlier spring phases in cities is mainly attributed to higher temperatures caused by the urban heat island effect (Lakatos and Gulyás 2003; Neil and Wu 2006).

Other authors accumulate the absolute values of temperature over several months and/or divide their study region into smaller areas with similar natural landscapes to better fit their model to topographic or other landscape characteristics (e.g., Chmielewski and Rötzer 2001; Schröder et al. 2006; Larcher 2007; Kolodziej and Frühauf 2008; Scheffler and Frühauf 2012). Here, we demonstrate that it is possible to develop statistical transfer functions between climate and phenology based on data from many different sites by using climate predictors in the form of anomalies (cf. Stöckli and Vidale 2004). The main objective of our statistical model is to predict the changes in phenology and not to forecast the absolute value. Therefore, spatial differences at station locations can be dealt with by one single model that, hence, refers to a much larger sample of data pairs. As a consequence, PPCs with lower data coverage can be treated as well when it comes to the assessment of future phenological changes at the regional scale.

Yet, some limitations of our modeling approach must be discussed: the first strong assumption that had to be made is the spatial interpolation of the meteorological data. The threshold of 20 km that limits the maximum distance between two stations could be reduced to bound the phenological stations closer to their local climate. Unfortunately, data coverage did not allow for a shorter distance between meteorological and phenological stations, even in the light of the high data amount in Germany. In order to address the sensitivity of our results to the interpolation method, we also tested the inverse-distance weighted interpolation, but the resulting correlations were much lower for all variables and intervals than for NN. Note, that for the large set of meteorological variables required for our analysis over a long data period no alternative gridded data set or high-resolution reanalysis is available.

The length of the accumulation interval was optimized in our study by systematic testing. It was implicitly assumed that the plants’ chilling needs are generally fulfilled at the given DOY. While for the flowering of forsythia the 45-day interval reaches the best model performance, this interval probably differs for other plants and phases. Individual intervals for each PPC might improve the individual results, but contradict our aim of a general and transferable model approach. Another approximation that had to be made was the definition of the start value of the accumulation interval. Here, the station-mean DOY was used and achieved reliable results compared with the measured DOY for each station and year. An iterative approach would be another possibility to approximate the DOY for future periods when the DOY is not observed. However, this is approach is computationally expensive, and it is not evident that a convergent solution can be found for every station.

The chosen 45-day interval was also successfully used in other empirical modeling studies (see Sect. 1): plant development and productivity seem to be driven by longer-term anomalies from the local mean of temperature, precipitation, wind, and snow depth. These meteorological boundary conditions influence plants as follows: temperature controls the rate of chemical processes in plants (Hatfield and Prueger 2015). Precipitation supplies the required water and, hence, reflects the water-stress conditions for the plant. This stress can be intensified by enhanced evapotranspiration caused by wind. Furthermore, wind can affect plants by means of mechanical movement and by changing leaf gas and heat exchange (Onoda and Anten 2011). It is also related to the North Atlantic Oscillation that is known for its effect on European phenology (Chmielewski and Rötzer 2001). Snow can protect plants from frost and provides moisture as it melts. Its depth also limits the sunlight a plant is receiving and, therefore, limits its photosynthesis.

In our study, the consideration of different regional models has demonstrated that the prediction error depends more on the station than on the regionalization. The overarching Bavarian model had almost the same quality in reproducing phenology as the regional models. The significant differences in prediction errors at the stations could be explained either by varying quality of the phenological data or additional factors like exposition, slope or land use-related anthropogenic influences that were not explicitly considered in this study.

Addressing a similar issue, Pollinger et al. (2017) showed that results from climate-phenology models based on different linear and nonlinear regression methods strongly resemble each other. Thus, choosing ordinary least square regression seems appropriate for this purpose. Other factors that were not available for this study but may affect phenology are global radiation, more specifically photosynthetic active radiation that provides the energy for photosynthesis (Neil and Wu 2006), and soil parameters. However, there is still a controversy whether and how these parameters influence plant development (McMaster and Wilhelm 1998; Repo et al. 2004; Apostol et al. 2007; Kolbe and Kaiser-Weiss 2015).

More generally, the quality of the phenological data must be scrutinized. Collected by volunteers, they depend on the conscientiousness and subjective perception of the observers who might interpret the phase definition differently. Larcher (2007) quantified the typical error generated by observers to ±2–3 days. Schaber and Badeck (2002) even estimate the error to be as high as 1 to 2 weeks. On the other hand, Baumgartner (1952) showed that phenology could exhibit a very large spatial variability over short distances and under similar boundary conditions (topography, meteorology, nutrient supply, age, breed of plant species). However, given the large sample size in our study, such site- and plant-specific variations should play a minor role. In fact, the performance of our model approach is quite good: every model setting led to an RMSE substantially smaller than the year-to-year standard deviation of DOY.

With the earlier onset of spring phases that can already be observed and will further aggravate into the twenty-first century the risk of plant damages due to late frost might increase despite a warming climate. While altogether a decline in frost days is expected, late frost days might not shift forward. Consequently, there is an increasing risk for blossoms and other growing stages until the year 2100 (cf. Bernhofer et al. 2011; IPCC 2013). This process might partly be counteracted by light-dependent plant mechanisms that have a demonstrable influence on phenology in spring (Linkosalo et al. 2000).

5 Conclusions

This study presents a robust statistical-dynamical modeling approach for computing present-day and future changes in phenology across Bavaria, southern Germany, by means of a set of significant climate predictors. All tested statistical model approaches generate satisfactory results in predicting phenology. It was found that reducing the number of meteorological predictors deteriorates the quality of prediction only to a minor extent but improves the statistical properties of the model. In addition, the regionalization of the model had no major impact on its performance, so this method is transferable to other regions. The main outcome of a negative phenological trend towards earlier onset dates during the twenty-first century is consistent with many previous studies (see Sects. 1 and 4). The advantage of our approach is its universal applicability. The model requires neither long time series nor an abundant set of meteorological variables. Using anomalies instead of absolute values allows for integrating station data from different elevations and landscapes into the same statistical model, enhancing the sample and making the model more robust. Therefore, we see a large potential for applications to various PPCs in other climate zones, taking our model as a tool to assess the link between shifting climate and vegetation zones all around the globe.