Introduction

Plant phenological data, even though being low-technology subjective observations often made by volunteers, are a prerequisite of studies within several modern research fields, like, e.g., climate change (Badeck et al. 2004; Menzel and Fabian 1999; Wolkovich et al. 2012; Penuelas et al. 2009; Fridley 2012; Chuine et al. 2004; Menzel et al. 2006; Rosenzweig et al. 2007) and crop yields (Tao et al. 2006), and are of interest to the agro- and pharma-industries. As flexible and free access to phenological data is limited, we built an online database, which provides a flexible, free and unrestricted access to plant phenological observations. We compiled data from various sources and made it accessible in a consistent format. In addition, we digitized historical data, thereby, for the first time, providing phenological observations for 130 years (1880–2009) for a large geographical region, i.e. Germany, in a consistent format. The data can either be retrieved as time series with respect to a plant, phase and station by a simple geo-referenced interface, or by a full-fledged SQL database access allowing for a wide range of individual queries (examples given below). Moreover, we provide combined time series for Natural Regions, which are corrected for outliers.

Data and methods

In the following, we refer to ‘observation’ as the reported day of the year in which a certain phenological phase, e.g. blossoming, of a certain plant was observed at a certain location, i.e. station. The data in PPODB is compiled from three distinct databases (see Table 1 for summary statistics):

  • Phenological observations collected by the Deutscher Wetterdienst (German meteorological service, DWD) from 1951 to 2009 (database DWD in Table 1).

  • The historical phenological database provided by the DWD, which is a collection of phenological observations from Central Europe, mainly Germany, covering the years 1880 until 1941 compiled from various sources (database HPDB in Table 1).

  • To fill the gap between the two aforementioned databases, we digitized phenological data that were available only in printed form. These data were collected by the volunteer network of the Deutscher Reichswetterdienst, and were published after World War II. These observations cover the years 1922 until 1944. Additionally, we digitized phenological data that were published between 1951 and 1961 in the meteorological yearbooks of the DWD (DWD 1951, 1953, 1960, 1961, 1991). Taken together, these historical data cover the years 1921–1955 and is made publicly available here for the first time (database HIS in Table 1).

Table 1 PPODB overview. Data are given as numbers of stations, phases, observations and observation periods in the different combined databases including plant varieties

In Table 1 we also provide the geographical area covered in each of the databases, respectively. We provide a table of the countries, number of observations and observational time range per country in Supplementary Table S1. Please refer to the description of the online database at www.ppodb.de for more details and a full description of the database.

Results

There are three ways to access the database, i.e. time series access for single station or Natural Regions, or full-fledged SQL-access. We shortly illustrate these three main features.

When accessing the database via www.ppodb.de the user first encounters a page where two perspectives can be chosen, i.e. single stations or Natural Regions, for a certain group of plants (Fig. 1).

Fig. 1
figure 1

PPODB start interface (screenshot from browser)

For clarity, we grouped the different plant types into agricultural plants, fruits, wild growing plants and vines.

Single stations

In the single station perspective plants, corresponding phases, and stations can be selected via drop-down menus (Fig. 2). Stations can additionally be selected by clicking on respective markers in the map. Initially, a map of Central Europe is displayed where most of the stations are grouped into clusters indicated by coloured circles for a better overview. The numbers on the coloured circles indicate the number of stations which are represented by this cluster. Clicking on cluster symbols zooms into the map, where the location of single stations becomes visible. Single stations are marked by red balloons, which contain some general information about the station, like station name, longitude, latitude, altitude, number and range of years in which observations were made (Fig. 2).

Fig. 2
figure 2

Single station perspective. Example for the phase ‘beginning of flowering’ of horse chestnut at the station ‘Geisenheim(DWD)’ (screenshot). Balloons indicate phenological stations. By clicking on the balloons some meta-information about the station is displayed

Once plant, phase and station are selected, the corresponding time series can be displayed either in a graph (menu ‘plot only’), table (menu ‘data only’) or both (menu ‘data and plot’). Optionally, a trend line is provided with the calculated trend and corresponding P-value (Fig. 3).

Fig. 3
figure 3

Plot of the time series for the phase ‘beginning of flowering’ of horse chestnut at the station ‘Geisenheim(DWD)’. The colour code of the data points indicates the source of the corresponding observation (screenshot), which is also displayed in the corresponding table

In case the station is present in different databases, the respective observations are colour coded (Fig. 3). Note that observations from the same station and year might have been reported in different databases with differing values. We kept all reported observations in the databases, even though in these cases the day of observation of the respective phase is ambiguous.

In the Supplementary Material we provide an additional summary table with all species–phase combinations in the combined database, which are still being observed by the German Weather Service, with their number of stations and observations, and average length of time series per plant, phase and station.

Natural regions

One of the main reasons to construct this database and to merge stations from different databases was to enable the construction of long phenological time series, so-called combined time series, in order to study the effect of climate change on plant phenology (Schaber 2002; Schaber and Badeck 2002, 2003, 2005; Schaber et al. 2010). A combined times series is a sophisticated average over many time series that corrects for artefacts introduced by simple averages due to the unequal distribution of observations in time and space (Schaber et al. 2010). In Fig. 4 we show histograms of the number of time series of a certain length for single stations and Natural Regions, respectively. For Natural Regions, there is a substantial increase of long time series at the expense of short time series. There are more than 480 combined time series for certain phenological phases for certain Natural Regions covering more than 100 years.

Fig. 4
figure 4

Histograms of the number of time series per length of time series over all species, phases and single stations (left panel) and Natural Regions (right panel), respectively. The corresponding data can be retrieved from the database with the following SQL-queries (left panel): “select c, count(c) from (select stat_id, phase_id, count(distinct obs_year) as c from all_pheno_obs where phase_id ! = 0 group by stat_id, phase_id) as sq group by c”, and (right panel): “select c, count(c) from (select naturraumgruppen_id, phase_id, count(distinct obs_year) as c from pheno_nr_ts group by naturraumgruppen_id, phase_id) as sq group by c”

Selecting the Natural Regions perspective in the start menu (Fig. 1), the user is presented an interface, where plant, phase and Natural Region can be selected (Fig. 5).

Fig. 5
figure 5

Natural Regions perspective. Example for the phase ‘bud burst of beech in the Natural Region Rhein-Main Tiefland’ (screenshot). Balloons indicate the geographical centre of the respective Natural Region. By clicking balloons the name of Natural Region is displayed

Again plant, phase and Natural Regions can be selected by drop down menus. Natural Regions can also be selected by clicking on the map. From this perspective, combined time series with error bars can be displayed (Fig. 6). Again, a trend can be optionally displayed.

Fig. 6
figure 6

Plot of the time series for the phase bud burst of beech in the Natural Region ‘Rhein-Main Tiefland’. The colour code of the data points indicates the source of the corresponding observation (screenshot). The error bars are the lower and upper 95 % confidence levels of the estimated mean day of year (dots), which are displayed in the corresponding table (L95CL and L95UL, respectively). The number of observations per year (n_obs) in the table indicate the number of observations/stations the combined mean per year was calculated from

The origin of the combined data is colour-coded as above with the extension that an estimated combined data point can come from more than one database. In the corresponding table the number of observations for each combined data point is also displayed.

SQL access

Through the ‘SQL access’-tab (see Fig. 1), the database can be accessed via SQL statements that allow all kinds of individual queries. The data for Fig. 4, e.g., can be extracted by one single SQL statement (see Fig. 4). For the summary Tables S1–S3 we also provide the respective SQL-statements as an example of the flexibility and range of queries.

Discussion

The joint databases made available with PPODB render accessible an important data source for further analyses of long-term changes in phenology.

The database is unique in as far as it covers more than a century of observations for a large geographical region and at the same time a substantial number of species as well as many observation stations. It complements another phenological online database, the paneuropean phenology database PEP725 (Koch et al. 2009) (www.pep725.eu), which is also unique in the sense that it partly covers other countries than PPODB for which it has more contemporary data. PEP725 observations start as early as 1868, but only for the relatively small region of the Netherlands. PEP725 provides data retrieval with downloadable species-related observations per country, where observations, station description and phase description are provided in separate files. With PPODB we provide an instrument including an SQL interface to the complete database that greatly facilitates data retrieved for all kind of summary information or very specific and focussed information and can potentially be used to improve the access to the PEP725 data base. In addition, the data of the HIS and HPDB databases can be used to construct long-term combined time series with data provided by PEP725 for other countries, especially Poland. Moreover, the combined time series for the Natural Regions of Germany provide a unique data source of reliable long-term phenological time series for a range of species and phenological phases.