1 Background

The intensive forest monitoring (Level II) is part of the International Co-operative Programme on Assessment and Monitoring of Air Pollution Effects on Forests (ICP Forests, http://icp-forests.net) under the umbrella of the Convention on Long-Range Transboundary Air Pollution of the United Nations Economic Commission for Europe (CLRTAP/UNECE). Across Europe, parameters such as meteorology, deposition, tree growth, and crown conditions are assessed on nearly 620 Level II plots following harmonized methods to study cause-effect relationships in forest ecosystems (Ferretti 2021). In Germany, data is available for a maximum of 100 Level II plots with some of them sharing open-field meteorological stations; 68 plots are mandatory under the German forest law (BMJ 1975) and operated by the forest research institutions of the federal states (Seidling 2005; Sanders et al. 2020).

In all surveys, data gaps can occur for various reasons (Sanders and Seidling 2012). However, daily meteorological observations such as air temperature and precipitation are key to assessing changes in forest ecosystems as they affect tree growth and vitality, nutrient cycles, and phenology (de Vries et al. 2014; Ruiz-Benito et al. 2020; Ziche and Seidling 2010).

To fill these gaps within the measured data, spatial interpolation procedures using a hybrid approach of linear regression and inverse distance weighting (Müller-Westermeier 1995) are used to interpolate daily temperature and precipitation utilizing the data from the German Weather Service (DWD) stations. The resulting interpolated data is validated against the available measured Level II data, and corrected for bias. Gaps within the measured data are eventually filled with the bias-corrected data.

2 Material and methods

2.1 Description of Level II plots

Meteorological data for 100 German Level II plots is available in the ICP Forests database. Of these plots, data from 78 plots covering at least ~20% of the 1996 to 2019 measurement period (about 5 years; number of days N > = 1746 out of 8766) was deemed suitable for validation and plot-specific bias correction after interpolation. The rest of the plots had sparse data (about 3 years; N < = 1093 out of 8766), with half-yearly measurements and completely missing months in-between. Therefore, they were not considered for validation and bias correction to avoid any implausibility. The plots are located along different environmental gradients across Germany. Climate data is recorded on open-field areas generally less than 2000 m from the main forest plots (Raspe et al. 2016). It is measured on a quasi-continuous basis and then aggregated as daily sums and means with a required degree of completeness (Raspe et al. 2016). The aim of this interpolation is to gain ready-to-use datasets which can subsequently be continued without changing older data series (Rukh et al. 2022).

2.2 Variables description

We interpolated the following variables and filled the missing gaps in the time series from 1996 until 2019 (see Table 1):

  1. a.

    Daily average of air temperature in °C, denoted by Tmean

  2. b.

    Daily minimum air temperature in °C denoted by Tmin

  3. c.

    Daily maximum air temperature in °C denoted by Tmax

  4. d.

    Daily sum of precipitation in mm denoted by P

Table 1 Descriptive summary of the dataset for the four interpolated variables. Shown here is the information on the measurement period, when empirical measurements were carried out, information on before the measurement period, quality control criteria, and total number of plots

Note that the complete dataset consists of two subsets. First dataset covers the measurement period from the start of 1996 until the end of 2019 on Level II plots. 1996 is the earliest year when the measurements of climatic parameters started. However, for some Level II plots, measurements of climatic parameters started later than 1996. In those cases, the missing values were also gap-filled after bias correction (see Section 2.4 and technical validation).

The second subset covers the timeline from the start of 1961 until the end of 1995. In this time period, no empirical measurements of climatic parameters exist on Level II plots. This subset contains only interpolated data, which was also corrected for bias.

After interpolation and bias correction, checks for goodness of fit were performed (Table 2).

Table 2 Statistical summary of the climate data validation. Bias on 78 plots was calculated as a difference between the daily empirical and interpolated values and aggregated into mean for each plot. Mean bias in the table below depicts the mean value of the climatic variables on the plots, so do the SD, mean correlation, mean R2, and mean RMSE

2.3 DWD stations

From the German Weather Service (DWD), a total of 1213 climate stations are available across Germany (DWD CDCa 2021). Not all of them were necessarily active continuously throughout the observation period. We used them to interpolate Tmean, Tmin, and Tmax. To interpolate daily precipitation P, we used the data from the larger in number and therefore denser precipitation network of the German Weather Service (DWD CDCb 2021). It includes the observations from the abovementioned 1213 stations and from additional precipitation monitoring network stations. This results in a total use of 5619 stations to interpolate precipitation. These extra precipitation monitoring stations were not necessarily active continuously throughout the observation period.

The geographical locations of the Level II plots as well as the DWD climate and precipitation stations within Germany are presented in Fig. 1.

Fig. 1
figure 1

Map showing the geographical locations of the ICP Forests Level II plots and DWD stations across Germany (source: DWD CDCa 2021; DWD CDCb 2021). The gray circles represent 50 km radius around the Level II plots

2.4 Hybrid interpolation approach and bias correction

Based on the approach of Müller-Westermeier (1995), we performed the interpolation of the climate data in four steps (also see Fig. 2):

  1. 1.

    Parameterize a general linear relationship between any given climate variable k and the elevation h above the sea level in meters by linear regression and using all DWD stations:

Fig. 2
figure 2

A flowchart diagram outlining the four steps of the interpolation approach, as outlined by Müller-Westermeier (1995). Bias correction using linear scaling method (Luo et al. 2018; Lenderink et al. 2007) as an additional step was performed, also shown in the flowchart diagram

$${k}_{DWD}={b}_{DWD}+a\cdot {h}_{DWD}$$
(1)

where a and bDWD denote the slope and the intercept of the regression, respectively. Equation 1 was run for each day between 1961 and 2019. For each day, only the active DWD stations during that time were used in the equation. For practical reason and to build a linear relationship with many points on xy-scale possible, we calculated the slope a using all at that time active DWD stations within Germany, irrespective of the surrounding radius.

  1. 2.

    With knowledge of the slope a, it is possible to reduce the climatic parameter at each DWD station to sea level (Eq. 2):

$${b}_{i, DWD}={k}_{i, DWD}-a\cdot {h}_{i, DWD}$$
(2)
  1. 3.

    After that, we applied inverse distance weighting (IDW) using bi, DWD from Eq. 2, to interpolate the climatic parameter reduced to sea level at the Level II plot (Eq. 3):

$${b}_{LII}=\frac{\sum_{i=1}^n\frac{b_{i, DWD}}{d_i^2}}{\sum_{i=1}^n\frac{1}{d_i^2}}$$
(3)

The interpolated value bLII is the climatic parameter reduced to sea level at the corresponding Level II plot. The subscript i denotes the DWD stations present within the radius of 50 km of the respective Level II plot; n is the total number of DWD stations present within this radius and used for interpolation. The di denotes the distance of a DWD station i, maximum 50 km, from the respective Level II plot. This radius was selected to allow maximum “active” DWD stations for IDW within our timeline of interest.

  1. 4.

    In order to calculate the value at the actual elevation hLII of the Level II plot, Eq. 4 has been used. Here, climatic variable kLII denotes the value at a Level II plot.

$${k}_{LII}={b}_{LII}+a\cdot {h}_{LII}$$
(4)

During the interpolation, biases arise due to systematic errors in the models (Luo et al. 2018), such as due to model parametrization to determine a climate variable, especially in case of precipitation which is inherently heterogenous (Herrera et al. 2010; Pan et al. 2001). Also, the statistical distributions of the measured and the interpolated data (modeled data, per se) might differ, and a correction should be applied (Ayar et al. 2021; Ivanov et al. 2018).

To correct for bias — the difference between the daily measured and interpolated climate value — we applied the method of linear scaling outlined by Luo et al. (2018); also see Lenderink et al. (2007). For daily temperature Tmean, Tmin, and Tmax, a correction factor was calculated as a difference between the monthly mean values of the daily measured temperature and the monthly mean values of the daily interpolated temperature. This was added to the daily interpolated temperature itself to correct for its bias.

$${T}_{corr,\ daily}={T}_{int,\ daily}+\left[{T}_{obs,\ monthly\ mean}-{T}_{int,\ monthly\ mean}\right]$$
(5)

Tcorr, daily, Tint, daily, Tobs, monthly mean, and Tint, monthly mean are the corrected daily temperature, interpolated daily temperature, monthly mean values of the daily measured temperature, and the monthly mean values of the daily interpolated temperature on Level II plots, respectively. The difference within the square brackets is the correction factor. T in Eq. 5 is valid for all three temperature variables Tmean, Tmin, and Tmax.

In case of precipitation P, the correction factor was calculated as a ratio between the monthly mean values of the daily measured precipitation and the monthly mean values of the daily interpolated precipitation. This factor was multiplied with the daily interpolated precipitation itself to correct for its bias.

$${P}_{corr,\ daily}={P}_{int,\ daily}\cdot \left[\frac{P_{obs,\ monthly\ mean}}{P_{int,\ monthly\ mean}}\right]$$
(6)

Pcorr, daily, Pint, daily, Pobs, monthly mean, and Pint, monthly mean are the corrected daily precipitation, interpolated daily precipitation, monthly mean values of the daily measured precipitation, and the monthly mean values of the daily interpolated precipitation, respectively. The ratio within the square brackets is the correction factor.

We would like to point out that this linear scaling method corrects for bias by calculating the correction factor which is specific to each month within the same year and then corrects the daily interpolated climate values of that month within that year. In our case, since we also interpolated the data from 1961 to 1995, applying this correction method was not possible since no measured data was available to calculate the correction factor. To deal with this, we calculated over the available timeline of the measured data from 1996 until 2019 a universal correction factor which was specific to each month but not specific within the same year and was also applicable to the interpolated data outside the measured timeline. Tobs, monthly mean, Tint, monthly mean, Pobs, monthly mean, and Pint, monthly mean were calculated specific to each month from 1996 until 2019, irrespective of the year. We then applied the universal correction factor, calculated in the square brackets of Eqs. 5 and 6 for the measured time period, in the same manner to the daily interpolated data from 1961 to 1995 outside the measured time period.

3 Technical validation

Before the interpolation (Section 2.4), we performed plausibility checks on the Level II climate data as per the quality control guidelines listed in the ICP Forests manual (Raspe et al. 2016). We performed the same plausibility checks on the DWD temperature and precipitation data. The criteria of the minimum daily completeness (%) of the data, as well as minimum and maximum plausible values for each of the variables to be interpolated, was used (see Table 1). The data, which did not fulfil these criteria, was discarded.

After interpolation, daily bias values (difference between daily interpolated and measured Level II data) were aggregated for each plot to reflect its mean bias. Standard deviation of the mean bias for each plot was also calculated. Pearson’s correlation coefficient and coefficient of determination (R2) depict agreement between the interpolated and the measured Level II data. Root-mean-square error (RMSE) qualifies the model performance for each variable in this case. Correcting for bias significantly improved the model performance for all the interpolated variables (Table 2) at p < 0.01 and brought the mean bias for each plot to zero. Moreover, we provide visual assessment files in .pdf format to depict the performance of the measured data against the bias-corrected data.

4 Reuse potential and limits

The daily gap-filled and extended time series of climatic variables can be aggregated to a chosen temporal scale. It offers opportunities to characterize climatic conditions on Level II plots based on, for example, climatic water balance. The complemented time series of temperature and precipitation also allows for calculation of drought indices such as the standardized precipitation evapotranspiration index (SPEI, Vicente-Serrano et al. 2010).

Our interpolation routine shows a flexible implementation of the used method. However, the routine does not cover checks for homogeneity. These checks along the temperature and precipitation time series could be performed in addition to correct for any structural breaks in the time series, which may arise due to spatial variability in the climate variables at different weather stations, especially for precipitation, but also by changes in measuring devices and plot surroundings. Nevertheless, our performed validation checks in Table 2 suggest a reasonably good performance of the interpolation routine and usability of the data. Based on this performance on a large dataset, the routine is suitable for the raw interpolated data as well, if options for bias correction are limited due to sparseness of measured data. For users, we also make raw interpolated data without bias correction available for the plots where measured data was sparse.

Additionally, we noted a few measured data points as anomalous. We provide their visual information under the folder “possible anomalies.” We do not rule them out as measurement error. It is hence subject to user, if they want to replace those data points with the provided bias-corrected data.

5 Access to the data and metadata description

The initial, untreated meteorological Level II data of air temperature and precipitation is archived by Programme Co-ordinating Centre (PCC) of ICP Forests in Eberswalde, Germany. For use beyond 2019 in the future, the data is available on request at http://icp-forests.net via the official data request form. Requests are evaluated for the scientific purpose, and access is usually granted within 2 weeks.

The DWD climate and precipitation data used in interpolation procedure can be found freely available at DWD (CDCa 2021; CDCb 2021), respectively.

The processed data — gap-filled, bias-corrected, and the statistical evaluations — are archived in the repository found at https://www.openagrar.de/receive/openagrar_mods_00079174. It contains comma separated value tables (.csv) for daily mean, min, max air temperature, and daily precipitation. Be aware that each time series is split into two parts covering either the period from 1961 to 1995 or the period from 1996 to 2019, and only the later period includes the gap-filled time series. Dataset has been further categorized into bias-corrected and raw interpolated time series, specific to the plots. For transparency purposes, we also publish the untreated meteorological data from 1996 until 2019, for the users to have an insight into the whole dataset and not into the gap-filled and bias-corrected data only. The metadata file https://metadata-afs.nancy.inra.fr/geonetwork/srv/fre/catalog.search#/metadata/433a028f-dfc8-4a7c-82af-b8d7efafd724 provides comprehensive information on the available datasets and data structure within the repository, including the following:

  1. 1.

    A basic description of Level II plots including the plot code, plot coordinates, plot names, and elevation

  2. 2.

    Technical variable descriptions and its location within the data structure of the repository, i.e., names of the .csv files that contain the prepared data

  3. 3.

    Contact information of the authors