Introduction

The alteration of forest landscapes, driven by agriculture and the needs of economy, has been identified as a primary factor affecting populations of forest-dwelling species at regional and local scales (Saunders et al. 1991; Fahrig 2002; Ewers and Didham 2006; Mazgajski et al. 2010; Bennett et al. 2014). Certain key forest birds species e.g. Chaffinch Fringilla coelebs and/or Tree pipit Anthus trivialis, (Kuczyński and Chylarecki 2012) have already suffered a long-term population decline as a result of habitat loss, fragmentation and increased isolation of woodland patches (Fahrig 2003; Mazgajski et al. 2010; Kajtoch et al. 2012; Żmihorski 2012). Further declines of other bird species are very probable, because increasing timber production is connected with the intensification of forest management (Sallabanks et al. 2000; Perry and Thill 2013). However, not all forest bird species are equally vulnerable to forest management practices and observed patterns of population changes differ across space, hampering our understanding of exact mechanisms driving the observed changes on the trans-national scale (Reif et al. 2008; Perry and Thill 2013). Therefore, studying relationships between population density and environmental factors on large geographical scales is one of key elements enhancing conservation strategies in the European Union as a whole (Sanderson et al. 2009). The most adequate data are those obtained directly from field measurements. However, covering large areas solely by fieldwork is basically unrealistic (Spanhove et al. 2012). That is why conservationists pay a lot of attention to developing methods for collecting and analyzing data from a monitoring scheme within a large area so they can monitor population health of many bird species (Stachura-Skierczyńska et al. 2009; Spanhove et al. 2012; Brambilla et al. 2013; Buchanan et al. 2013; Linden and Roloff 2013; Walczak et al. 2013).

In the last few years many studies have focused on species distribution modelling (e.g. Franklin 2009; Brambilla et al. 2013; Guisan et al. 2013; Kosicki and Chylarecki 2013, 2014; Stachura-Skierczyńska et al. 2009) based on national and regional field monitoring programmes, e.g. the Pan-European Common Bird Monitoring Scheme (Gregory et al. 2007; Kosicki and Chylarecki 2013, 2014). Many of these studies use publicly available generalized Geographic Information Systems (GIS) datasets (i.e. Corine land cover, NDVI dataset, WordClim) as predictors (e.g., Giordano et al. 2010; Kosicki and Chylarecki 2013; Morelli and Tryjanowski 2014). However, such data are only suitable for some opportunistic species that have habitat requirements with sharp boundaries in the landscape, e.g. the Coal Tit Periparus ater (Da Silva et al. 2012), Common Chiffchaff Phylloscopus collybita (Lampila et al. 2009) and Wood Nuthatch Sitta europea (Kuczyński and Chylarecki 2012). In case of more specialized species, other environmental characteristics, i.e. small-scale structural elements or habitat variety might influence their habitat selection decisions. However, this kind of detailed information cannot be obtained from generalized remote sensing-based data sources (Akçakaya 2001; Giordano et al. 2010). On the large geographical scale, e.g. the country, gaining data on small-scale structural elements within habitats is only possible in case of closed schemes, e.g. forest inventory database (Rego et al. 2004). This kind of data can be very useful for ecological research and conservation purposes on the national and transnational level (Stachura-Skierczyńska et al. 2009; Skierczyński et al. 2013, Stachura-Skierczyńska and Kosiński 2014) but have not been sufficiently employed in the species distribution model (SDM) of density.

In this study we analyze spatial distribution patterns of the Goldcrest Regulus regulus and the Firecrest Regulus ignicapilla, two forest breeding bird species sharing partially an ecological niche and recently displaying a decreasing population trend (Chylarecki and Jawińska 2007; Kuczyński and Chylarecki 2012). Both species are widespread across Europe, inhabiting coniferous and mixed forests. The Goldcrest prefers the spruce Picea abies and the fir Abies alba, while the Firecrest is more opportunistic and additionally chooses habitats with deciduous trees, i.e. the beech Fagus silvatica (del Hoyo et al. 2006; Kralj et al. 2013). The effect of other tree species—mainly the pine, being one of the most abundant tree species in large parts of Europe—as well as the forest’s age and structure on the density and distribution of both Regulus species is ambiguous. An earlier study showed that pine forests might have either negative or positive effect on both species (Purroy 1974a, b; Lebreton and Thevenot 2009). The most recent studies, (Kuczyński and Chylarecki 2012; Kralj et al. 2013) did not address this issue, as they did not examine variables related to the forest structure.

In our study, we analysed factors shaping the Goldcrest’s and Firecrest’s densities, however, apart from environmental variables derived from remote sensing data we also used more detailed information on the forest composition, e.g. tree species diversity, the number and age of trees as well as forest vertical structure.

In this way we were able to capture more detailed relationships between the species’ densities and environmental conditions on a large spatial scale (Saveraid et al. 2001; Gibson et al. 2004; McPherson and Jetz 2007; Müller et al. 2009). A recent study also indicated that the occurrence of co-existing species might be an important predictor for SDM, but this kind of information is rarely used (Kissling et al. 2012; Morelli and Tryjanowski 2014).Therefore, we involved densities of both species as additional predictors in SDMs in order to test the usefulness of one species’ density as the surrogate of other species’ densities on a large geographical scale.

We used data from Poland which is particularly suitable for this kind of analyses, given a large area of forests (9163.8 thousand ha—Anon 2013) and the existing gradient of forest structure across the country (Anon 2013). We capitalize on the existing data from a countrywide survey of common breeding birds, yielding presence/absence data for >900 study plots representative for the country (Kuczyński and Chylarecki 2012). Presence/absence data recorded during planned surveys are clearly preferable to presence-only data in SDM studies due to a multitude of reasons (Franklin 2009; Royle et al. 2012).

Having access to high quality data on bird occurrences, coupled with existing environmental data from other sources, the main goals of this study are to: (1) determine the extent to which generalized remote sensing-based datasets are able to predict the distributions of the two Regulus species in comparison to detailed forest inventory data from ground measurements; and (2) compare the Goldcrest’s and the Firecrest’s densities.

Materials and methods

Bird data

The Goldcrest and Firecrest density data were derived from the common breeding birds monitoring scheme (Chylarecki and Jawińska 2007) and collected in Poland in years 2000–2013 in 970 grid cells of 1 square km (see Appendix A, Fig. S1). Squares treated as survey plots had been chosen at random out of 311,664 squares covering whole Poland. In a particular breeding season each plot was surveyed twice. The first visit took place between 10 April and 15 May and the second between 16 May and 30 June. Each survey started between the dawn and 9 am and took about 90 min. The route of bird census consisted of two parallel one-by-one-kilometre sections (transects), along an east–west or north–south axis. Each transect was divided into five 200-meter sections, in which birds were noted in three distance categories (<25 m, 25–100 m, >100 m). In the analysis we only use squares which were inspected twice in breeding season, however due to the fact that surveys were carried out by volunteers, the majority of plots were not checked every year. During the 14-year span each square was inspected on average 6.5 times (SD = 3.9).

As the number of individuals in the given grid cell we used the highest recorded number of individuals during one of the two inspections. Finally, both species are very small and rather shy during breeding season so in the analysis we only considered birds which had been seen or heard in only two distance categories.

Environmental data

For modelling purposes, we used several environmental variables related to topography, climatic conditions, land cover (general), vegetation indices and forest stand structure (Table 1). All datasets were converted into GRASS GIS file format (Neteler and Mitasova 2008) with the grid size of 1 km2 (corresponding to bird survey plots) and re-projected to coordinate system EPSG4284. projection (http://spatialreference.org/ref/epsg/4284/).

Table 1 Environmental predictors used for the model design

Altitudes above sea level (a.s.l.) were obtained from the digital evaluation model dataset (GTOPO30, resolution 1 km2), originally provided by the U.S. Geological Survey’s EROS Data Center (Sioux Falls, South Dakota). For modelling purposes we used both absolute altitudes and relative differences between maximal and minimal altitudes per plot.

Climate data were obtained from the WorldClim database (http://www.worldclim.org), which is a set of global climate layers (climate grids) with 1 km2 spatial resolution. Compared to previously used large-scale climate data, this dataset has the highest spatial resolution (Hijmans et al. 2005) and has been successfully used as a predictor variable for modelling species’ spatial distribution (e.g. Horn et al. 2012). We used the following six variables which determined the variability of temperature and precipitation in Poland: Annual Mean Temperature, Mean Temperature of Warmest Quarter, Mean Temperature of Coldest Quarter, Annual Precipitation, Precipitation of Warmest Quarter, and Precipitation of Coldest Quarter.

We used the CORINE Land Cover database Level 3, available at 100 meter resolution and categorized using 44 land cover classes, out of which 31 were distinguished in Poland. The database, updated in 2006, was based on Landsat TM, IRS and SPOT-4 images (EEA 2007). In this study we found on average 11 types of land cover per plot. The most frequent were non-irrigated arable land, meadows, complex cultivation patterns, agricultural areas with natural vegetation, shrub, and inland marshes. We did not use categories depicting urban areas and coniferous and deciduous forests from the original CORINE dataset, since more detailed data were available. In order to estimate the density of human settlements, we used images of lights at night. These datasets come from remote sensing (http://svs.gsfc.nasa.gov/vis/a030000/a030000/a030028/index.html) and the lights on the night map include cities, towns, and other sites with persistent lighting. This kind of data is highly correlated with the population level (Small et al. 2005; Doll et al. 2007). General types of the forest cover were obtained from the Joint Research Centre (http://forest.jrc.ec.europa.eu/) and based on remote sensing. The dataset of 25 m resolution includes two types of forest: coniferous and deciduous. Higher resolution allows detecting small patches or stripes of woodland, not distinguished in CORINE database.

We also obtained the normalized differences vegetation index (NDVI), derived from the SPOT dataset (http://free.vgt.vito.be/). NDVI is an index of green vegetation level, and is expressed as a mean monthly NDVI value (from March to June).

Forest stand structure information comes from the forest inventory database, consisting of stand-level numeric maps and associated attribute data. The database is owned and managed by the National Forest Holding “State Forests”. Full access to data is restricted and regulated by specific project-related agreements between State Forests and particular institutions.

Stands are assumed to be homogenous spatial units. For each stand, the database contains a detailed description of the canopy (all tree species, their age and estimated percentage cover). The information was obtained from ground measurements, supplemented by high-resolution aerial photos and it was updated regularly, depending on the local schedule of management activities (usually once a year, but some records might be older). We used the following variables based on the stand description: the number of tree species, percentage cover of particular trees, age of trees and forest diversity in vertical space (number of levels) (see Appendix A, Table S1).

Spatial resolution of this data is lower than 1 km2 bird survey plot. All variables are expressed as a weighted mean, where the weight is the area of spatial units.

Data processing and analysis

All statistical analyses were performed using the statistical package R (R Development Core Team 2010). The mean densities (for all years of study) of Goldcrests and Firecrests expressed as the number of birds in each grid cell (maximum number of birds recorded during visits) were respectively 0.60 individuals/km2 (95 % CL = 0.47–0.73, n = 970) and 0.16 individuals/km2 (95 % CL = 0.11–0.20, n = 970). Due to the fact that bird densities in plots depended on the transect’s length and the distance from the observer (Krebs 1999), the Hayne estimator of species density was calculated for each cell according to the equation below (Hayne 1949; Krebs 1999):

$$\widehat{D}_{H} = \frac{n}{2L}\left( {\frac{1}{n}\sum {\frac{1}{{r_{i} }}} } \right)$$

where, \(\widehat{D}_{H}\) is Hayne’s estimator of density, n is the number of animals seen, L is the length of transect, r i is sighting distance to each animal i according to two distance classes, i.e.: 25 and 50 m.

In order to avoid multicollinearity among environmental variables, principal components’ analysis (PCA) was performed with Varimax normalized rotation, separately for each of the two environmental datasets, i.e. climate and habitat (Quinn and Keough 2002). Principal components’ axes with eigenvalues >1 were retained as predictor variables in the analyses.

PCA of climate variables produced two axes, which explained 83.7 % of the original variation in climate variables (see: Table 1 and Appendix A, Table S2A).

Habitat variables, which were derived from the corine land cover (CLC), HUMAN and the type of forest coming from the Joint Research Centre, produced seven components and explained 88.1 % of the variation (see: Table 1 and Appendix A, Table S2B). PC7 has a low value of the explained variance and is not included in the analysis.

Pearson correlation coefficients and the variance inflation factor (VIF, using HH library in R; Heiberger 2013) were used to assess relationships between predictors which came from two separate PCA analysis and NDVI variable, geographical coordination and topography (Appendix A, Table S3). VIF of 5 and above indicated a multicollinearity problem (O`Brien 2007). In the present study, VIF ranged from 1.6 to 12.9, therefore predictors whose VIF ≥ 5 were excluded from the analysis (see Appendix A, Table S3).

We employed the generalized additive model (GAM) to fit resource selection functions (Hastie and Tibshirani 1990). The Hayne estimator of species’ density was used as the response variable. Fourty-four variables extracted by PCAs, geographical variables (longitude, latitude, altitude and denivelation), NDVI from March to April forest stand structure information were used as predictors (Table 1). We developed three sets of GAM: (1) models based on coarse data obtained from remotely sensed images (topographic, climatic, general land cover, vegetation indices); (2) models based on detailed forest inventory data derived from ground measurements; and (3) models based on both datasets, i.e. remotely sensed images and forest structure provided by the forest inventory database. The most parsimonious model was selected using the Akaike information criterion (mgcv library in R; Wood 2013) with the lowest AIC and consequently the highest Akaike weight (Burnham and Anderson 2002). We analysed all possible models (2n, where n = number of variables), using MuMIn library in R (Hastie and Tibshirani 1990; Bartoń 2013). Although this approach is criticized, the analysis of all possible models is often used when there is not enough a priori information to develop a small set of models (Whittingham et al. 2005; Reino et al. 2010). The probability of including a variable in the best parsimonious model was estimated as the relative importance (RI) by summing the Akaike weights of all candidate models in which the variable was included (Burnham and Anderson 2002; Reino et al. 2010). As a measure for the best model we used evidence ratio (Burnham and Anderson 2002). In order to allow some complexity in the functions, while avoiding over-fitting the data, we defined the basis dimension, i.e. k = 4 (Santana et al. 2012). The Gaussian distribution of errors and the identity link function were applied. As a measure of deviance reduction, we used the D 2 coefficient, which was equivalent to R 2, well known from least squares estimation (Weisberg 1980). Then, from the best model for each species and each of the three sets of predictors’ variables we created and presented predictions of distributions. The correlation coefficient between the predicted versus the observed densities (log-transformed) was used as a measure of error prediction (Hastie et al. 2008; Kuczyński et al. 2009).

Results

Population size

Breeding populations of Goldcrests and Firecrests were recorded respectively in 36.8 and 15.4 % of grid cells. The mean density of the Goldcrest expressed as the Hayne estimator was 1.51 (95 % CL: 1.31–1.71) individuals/1 km2, while the mean Hayne estimator on occupied plots was 4.12 (95 % CL: 3.7–4.5) individuals/1 km2. In turn, the mean density of the Firecrest also expressed as the Hayne estimator was 0.45 (95 % CL: 0.34–0.52) individuals/1 km2, while the mean Hayne estimator on occupied plots was 2.9 (95 % CL: 3.7–4.5) individuals/1 km2. We found a correlation between the raw density of the Goldcrest (expressed as the number of birds in each grid cell) and the raw density of the Firecrest (r = 0.29, P < 0.001, n = 970), and then between the Hayne estimator of Goldcrests’ and Firecrests’ densities (r = 0.58, P = 0.001, n = 970). These correlations are not simply indications that both species are found only in the forested squares, as there was also a significant correlation even when considering data only from those squares where at least one of the species was present (raw density r = 0.26, P < 0.001 and the Hayne estimator r = 0.61, P < 0.001).

Habitat use by Goldcrest and predictive map of \(\widehat{D}_{H - G}\)

The most parsimonious model of GAM (Table 2, model A2) based only on the remotely sensed coarse data and the fit to the Hayne estimator of the Goldcrest’s density \((\widehat{D}_{H - G})\) included a smooth fit to LATITUDE (RI = 1, F = 18.57, P = 0.0001), ALTITUDE (RI = 1, F = 10.64, P = 0.0001), CFORFARML (RI = 1, F = 6.82, P = 0.0001), CODEFOR (RI = 1, F = 2.17, P = 0.0008), NDVI.MAR (RI = 1, F = 2.68, P = 0.0003) and TEMP (RI = 0.679, F = 2.08, P = 0.003). This model was definitely better for describing the variation of \(\widehat{D}_{H - G}\) than the second model from our candidate set (evidence ratio = 2.08, Table 2, model A3).

Table 2 The comparison of GAM models based on the standardized Goldcrest \(\widehat{D}_{H - G}\) and Firecrest \(\widehat{D}_{H - G}\) density (null and all where Akaike weights ≥0.001) differing in numbers of predictors, linearized or included predictors as a polynomial spline

In turn the most parsimonious model of GAM (Table 2, model B2) based only on the detailed data on the forest structure and the fit to the Hayne estimator of the Goldcrest’s density \((\widehat{D}_{H - G})\) included a smooth fit to DOM.FIR (RI = 1, F = 3.89, P = 0.002), AGE.FIR (RI = 0.709, F = 3.91, P < 0.001), DOM.PINE (RI = 1, F = 1.35, P = 0.003) and NO.SPEC (RI = 1, F = 4.67, P < 0.0001) and linear fit to DOM.SPRUCE (RI = 1, F = 2.01, P = 0.001) and AGE.SPRUCE (RI = 0.647, F = 3.11, P = 0.001). This model was slightly better for describing the variation of \(\widehat{D}_{H - G}\) than the second model (evidence ratio =1.008, Table 2, model B3).

However, the most parsimonious model, which apart from the coarse data used the forest structure and the Firecrest’s density \((\widehat{D}_{H - F})\) as additional predictors, fitted to \(\widehat{D}_{H - G}\) (Table 2, model C2) and revealed a smooth fit to CFORFARML, DOM.FIR, AGE.FIR, NO.SPEC, NDVI.MAR and PREC and a linear fit to ALTITUDE, LATITUDE, TEMP, DOM.SPRUCE, CODEFOR, AGE.SPRUCE and DOM.PINE. This model was definitely better than the second model (evidence ratio =37.3, Table 2, model C3).

In this case, the most important predictors based on RI included two variables capturing the main habitat gradients, i.e. CFORFARML (RI = 1, F = 13.07 P = 0.0001; Fig. 1a) and CODEFOR (RI = 1, F = 12.38, P = 0.0001; Fig. 1b). The first of them was characterized by a non-linear shape, while the second by a linear shape of the response function and represented the habitat gradient from large (CFORFARML) and small (CODEFOR) coniferous forests to (respectively) arable fields and deciduous forests. The highest values of \(\widehat{D}_{H - G}\) were found in plots where coniferous forests dominated. The next important variables described the forest stand structure. The highest \(\widehat{D}_{H - G}\) was noted in plots where the spruce (DOM.SPRUCE, RI = 1, F = 15.64, P = 0.0001; Fig. 1c) and the fir (DOM.FIR, RI = 1, F = 12.05, P = 0.0007; Fig. 1d) dominated, but in both cases old trees were clearly preferred, as shown by corresponding variables AGE.SPRUCE (RI = 1, F = 1.24, P = 0.003; Fig. 1e); and AGE.FIR (RI = 1, F = 4.84, P = 0.0007; Fig. 1f). The species also avoided the pine (DOM.PINE, RI = 1, F = 4.23, P = 0.01; Fig. 1g) and preferred forests with a low number of tree species (NO.SPEC, RI = 1, F = 1.97, P = 0.01; Fig. 1h). Then, NDVI.MAR (RI = 0.972, F = 2.10, P = 0.0001; Fig. 1i) showed that the species’ density increased in areas where the vegetation season began early. The next variables were TEMP (RI = 0.998, F = 1.12, P = 0.003; Fig. 1j) and PREC (RI = 0.972, F = 12.09, P = 0.0001; Fig. 1k) that indicated that the Goldcrest favoured relatively low temperatures and high precipitation. Equally important variables were ALTITUDE (RI = 0.998, F = 2.84, P = 0.0001; Fig. 1l), and LATITUDE (RI = 0.972, F = 1.52, P = 0.0001; Fig. 1m), revealing also a non-linear shape of the response. The highest Goldcrest’s density corresponded to higher elevations especially in the north of Poland. With the output of this model as the base, we created a predictive map of \(\widehat{D}_{H - G}\) (Fig. 2a).

Fig. 1
figure 1

GAM fit for the Goldcrest density. a CFORFARML represents the gradient from a large area of coniferous forest (C.forest) to arable fields (Farmland); b CODEFOR represented a habitat gradient from small area of coniferous forest (S.c.forest) to large area of deciduous forest (D.forest); c DOM.SPRUCE, d DOM.FIR, e DOM.PINE reflect forest where share of (respectively) spruce fir and pine are low (L.share) and where these species dominated (H.share); f AGE.SPRUCE, g AGE.FIR reflect forest where occur young spruce and fir (Young), while (Old)—shows forest where dominated old these species; h NO.SPEC represented the number of tree species (L.number)—areas with low number of tree species, while (H.number)—areas with high number of species; i NDVI.MAR reflect area where green vegetation in March is low (Low) to area where vegetation in march is high (High); j TEMP represented the temperature gradient, (Low)—areas areas with low temperature, while (High)—areas with high temperature; k PREC represented the level of precipitation (Low)—areas with low level of precipitation, while (High)—areas with high level of precipitation; l ALTITUDE represented a geographical gradient from area of low elevation (Low) to area of high elevation (High); m LATITUDE represented geographical gradient from south (South) to north (North). The y-axis: density—(s) smoother function with estimate degrees of freedom in parenthesis (Hastie and Tibshirani 1990; Brown 2011). The shaded areas represent standard errors of the estimate curves

Fig. 2
figure 2

Predictive maps of Hayne estimator of Goldcrest (a) and Firecrest (b) density

Habitat use by Firecrest and predictive map of \(\widehat{D}_{H - F}\)

Based on the remotely sensed coarse data only (the same dataset as in \(\widehat{D}_{H - G}\)) and the fit to the Hayne estimator of the Firecrest’s density \((\widehat{D}_{H - F})\), the model of GAM (Table 2, model D2) included a smooth fit to ALTITUDE (RI = 0.994, F = 3.54, P = 0.0001), and LATITUDE (RI = 0.926, F = 2.22, P = 0.0001); and a linear fit to CODEFOR (RI = 1, F = 2.32, P = 0.001), FARMLMFOR (RI = 1, F = 3.11, P = 0.0001), TEMP (RI = 0.999, F = 2.87, P = 0.0001), DMFOR (RI = 0.994, F = 1.97, P = 0.001) and PREC (RI = 0.627, F = 1.54, P = 0.004). This model explained the variation of \(\widehat{D}_{H - F}\) better than the second model in our candidate set (evidence ratio 1.46, Table 2, model D3).

In turn the most parsimonious model of GAM (Table 2, model E2) based only on the detailed data on the forest structure and the fit to the Hayne estimator of the Firecrest’s density \((\widehat{D}_{H - G})\) included a smooth fit to DOM.BEECH (RI = 1, F = 2.07, P = 0.001), DOM.SPRUCE (RI = 0.972, F = 1.42, P = 0.003), DOM.FIR (RI = 0.991, F = 3.09, P < 0.001), AGE.FIR (RI = 1, F = 2.99, P < 0.001), AGE.LARCH (RI = 1, F = 1.66, P = 0.003) and DOM.PINE (RI = 0.991, F = 1.99, P = 0.001) and linear fit to NO.SPEC (RI = 0.994, F = 1.22, P = 0.001). This model was definitely better for describing the variation of \(\widehat{D}_{H - F}\) than the second model (evidence ratio =87.36, Table 2, model E3).

However, the most parsimonious model (Table 2, model F2) which, apart from the coarse data, used the forest structure data (the same dataset as in \(\widehat{D}_{H - G}\)) and the Goldcrest’s density (\(\widehat{D}_{H - G}\)) included a smooth fit to FARMLMFOR, \(\widehat{D}_{H - G}\), DOM.BEECH, DOM.SPRUCE LATITUDE, ALTITUDE, TEMP, PREC and a linear fit to CODEFOR, AGE.SPRUCE, DOM.PINE, AGE.FIR, AGE.LARCH to DOM.SMAP AGE.ELM and NO.SPEC. This model was slightly better than the second model in our candidate set (evidence ratio 1.002, Table 2, model F3). In this case, FARMLMFOR (RI = 1, F = 3.22, P = 0.0001; Fig. 3a) and CODEFOR (RI = 1, F = 1.45, P = 0.000; Fig. 3b) turned out to be the most important factors. All of them represented the habitat gradient from arable fields (FARMLMFOR) and a small area of coniferous forest (CODEFOR) to small and large areas of mixed and deciduous forests. The third component, \(\widehat{D}_{H - G}\) (RI = 1, F = 16.14, P = 0.0001; Fig. 3c) indicated that high densities of Firecrests were noted in areas with high densities of Goldcrests. Next important variables were DOM.BEECH (RI = 1, F = 16.9, P = 0.00001; Fig. 3d) and DOM.SPRUCE (RI = 1, F = 1.9, P = 0.0002; Fig. 3e). All of them showed that the highest \(\widehat{D}_{H - F}\) corresponded to areas where the beech and spruce dominated. An old spruce tree was clearly preferred as shown by the following variable: AGE.SPRUCE (RI = 1, F = 2.01, P = 0.0007; Fig. 3f). At the same time the Firecrest avoided the pine (DOM.PINE, RI = 1, F = 3.12, P = 0.0001; Fig. 3g) and preferred a single old fir (AGE.FIR, RI = 1, F = 1.98, P = 0.0001; Fig. 3h) and an old larch (AGE.FIR, RI = 1, F = 1.44, P = 0.001; Fig. 3i). Other important factors included NO.SPEC (RI = 1, F = 3.01, P = 0.02; Fig. 3j), showing that Firecrests preferred a multi-species forest. The last variables included LATITUDE (RI = 0.685, F = 1.29, P = 0.0001; Fig. 3k), ALTITUDE (RI = 0.685, F = 5.12, P = 0.0001; Fig. 3l), TEMP (RI = 0.899, F = 2.19, P = 0.0001; Fig. 3m) and PREC (RI = 0.899, F = 1.99, P = 0.0001; Fig. 3n). All of them presented a non-linear shape of the response function, showing that the highest density was noted in upland, both in southern and northern parts of Poland, where the climate was relatively cold and wet. According to this model we created a predictive map of \(\widehat{D}_{H - F}\) (Fig. 2b).

Fig. 3
figure 3

GAM fit for the Firecrest density. a FARMLMFOR is related to the gradient from arable fields (farmland) to small and large areas of mixed forest (M.forest); b CODEFOR represented a habitat gradient from small area of coniferous forest (S.c.forest) to large area of deciduous forest (D.forest); c Ď H-G showing areas with low density of the Goldcrest (Low.d) and areas with high density (High.d); d DOM.BEECH, e DOM.SPRUCE, f DOM.PINE reflect forest where share of (respectively) beech fir and pine are low (L.share) and where these species dominated (H.share); g AGE.SPRUCE, h AGE.FIR, i AGE.LARCH reflect forest where occur young spruce and fir (Young), while (Old)—shows forest where dominated old these species; j NO.SPEC represented the number of tree species (L.number)—areas with low number of tree species, while (H.number)—areas with high number of species; k LATITUDE represented geographical gradient from south (South) to north (North); l ALTITUDE represented a geographical gradient from area of low elevation (Low) to area of high elevation (High); m TEMP represented the temperature gradient, (Low)—areas areas with low temperature, while (High)—areas with high temperature n PREC represented the level of precipitation (Low)—areas with low level of precipitation, while (High)—areas with high level of precipitation. The y-axis: density—(s) smoother function with estimate degrees of freedom in parenthesis (Hastie and Tibshirani 1990, Brown 2011). The shaded areas represent standard errors of the estimate curves

Discussion

Species distribution models (SDMs) are developed in order to identify ecological requirements of species and to predict their spatial density (Naimi et al. 2011). From the practical point of view, SDMs can provide valuable support for conservation efforts (Jetz et al. 2012) both at local, regional, and trans-boundary scales (Margules and Pressey 2000; Addison et al. 2013). SDMs can use either presence-absence or presence-only occurrence data of species of interest, but most of them use publicly available, remote sensing-based datasets as environmental predictors (Guisan et al. 2013). Such datasets, due to their generalization and/or coarse resolution, might be insufficient for the task of identifying most important environmental factors for certain species (Morelli and Tryjanowski 2014), especially the specialized ones.

In this study, we tested the usefulness of generalized remote sensing data as predictors of two Regulus species. Our first step was to develop separate prediction models for Goldcrests’ and Firecrests’ densities using the same set of remote sensing-based predictors. The Goldcrest responded positively to the increasing area of coniferous forest, while the Firecrest preferred mixed and deciduous forests. Moreover, the predicted densities of both species increased in uplands and in cold and wet climate of northern (especially north-east for both species as well as north-west for the Firecrest) and southern parts of Poland. For both species the relationship was obvious (Ulfstrand 1976; Tiainen et al. 1983; del Hoyo et al. 2006; Kralj et al. 2013), but it should be noted that open-access datasets like the Corine Land Cover or remote sensing-based forest cover maps do not contain information about the species’ composition, only about general forest types, such as coniferous vs. deciduous. In Poland, similarly to many Central European countries, the most abundant conifer species is the pine Pinus silvestris. Therefore, using only open source habitat data as a predictor might lead to a conclusion that high densities of Goldcrests and Firecrests depend on the prevalence of the pine (see also Fig. 1g and 3g). Indeed, in the Pyrenees both Goldcrests and Firecrests are recorded in pine forests (Purroy 1974a, b), but this is not the case elsewhere (Lebreton and Thevenot 2009). Therefore, models that use only open access generalized data do not provide complete information regarding the species’ preferences towards specific forest conditions.

Our results also show that the Goldcrest prefers forest areas with high NDVI values at the beginning of spring while the Firecrest does not exhibit such preferences. This might be explained by the fact that the Goldcrest favours coniferous forests that are green all year round (especially in winter). The Firecrest is abundant in mixed forests that have a high NDVI rate both in early spring, when deciduous trees are still leafless, and in late spring when both deciduous and coniferous trees appear green. According to our results, both species are abundant in the uplands. These relationships reflect a general pattern of species distribution in Europe. On the scale of the continent the Goldcrest prefers middle and high altitudes (Hagemeijer and Blair 1997; del Hoyo et al. 2006), while the Firecrest is particularly abundant in only mountainous forests in southern parts of Europe (Leisler and Thaler 1982; Telleria and Santos 1994; Hagemeijer and Blair 1997).

The second group of models based on the detailed data on the forest structure proved to be definitely better than the first group of models (Goldcrest Δ AIC = 77.3, Firecrest Δ AIC = 34.9), and differences from the third group of models where both generalized and detailed environmental data were used as a predictors, were comparatively small (Goldcrest Δ AIC = 16.6, Firecrest Δ AIC = 4.6). Distributions of both Goldcrests and Firecrests were best described by a combination of features measured by remote sensing and by on-the-ground measurement of the habitat; the ground-measured habitat features played a greater role in predicting distributions of these two species. Similar patterns were found in connection to other species, e.g. the Hazel Grouse Bonasa bonasia (Müller et al. 2009); the Rufous Bristlebird Dasyornis broadbenti (Gibson et al. 2004) as well as in bird species richness in the Yellowstone National Park (Saveraid et al. 2001). However, the studies were carried out on a relatively small research area (e.g. 97 km2 where it was quite easy to count the number of small-scale habiat elements); and-or closed areas, i.e. National Parks where small-scale habitat elements were frequently recorded. In our case (study area >300.000 km2) we used data from the forest inventory database. The primary aim of the database was to describe the main features of forests in terms of size, condition and change of growing trees, focusing mainly on productive aspects, namely wood resources (Rego et al., 2004). Nevertheless, the forest stand description can be very useful for modelling species’ occurrence (Stachura-Skierczyńska et al. 2009; Stachura-Skierczyńska and Kosiński 2014), implementing monitoring schemes (Skierczyński et al. 2013), and as shown by our study on species distribution modelling of density specialized species.

The Goldcrest reached the highest density in areas with a high share of old spruce forest with fir. It generally avoided pines and preferred forests with a low number of tree species. The Firecrest chose mixed forests, dominated by beech and old spruce with an admixture of single old fir and larch trees. Firecrests also avoided pines, but favoured forests with a high number of tree species. These different habitat choices can be explained by the existing differences in the morphology and foraging behaviours of these species (Leisler and Thaler 1982; Kralj et al. 2013). Goldcrests more often forage by climbing vertically and hanging under dense spruce branches, whereas Firecrests prefer trees with less dense branches that allow them to move more rapidly and forage in a standing position (Leisler and Thaler 1982).

Furthermore, both species prefer old forests and definitely avoid pine stands. In all biomes, old-growth forests tend to have a complex structure and a variety of microhabitats characterised by different temperatures, light and humidity, which bring about a high abundance of insects, such as Hemiptera, Collembola and spiders, being major food resources for the Regulus and other bird species (Cramp 1992; Jokimäki and Solonen 2011; Rosenvald et al. 2011; Edenius et al. 2012). Moreover, old-growth forests are characterized by a relative spatial–temporal stability, ensuring long-term stable conditions for breeding bird communities (Wesołowski et al. 2006; Wirth et al. 2009; Rosenvald et al. 2011). Such inherent features are not demonstrated by young pine forests, which in most cases are poor in insect species and do not provide birds with a sufficient amount of food (Wirth et al. 2009; Rosenvald et al. 2011). Furthermore, longer needles of the pine do not allow birds to move efficiently through the branches, a drawback which also makes acquiring food difficult.

As for the climate and the topography, both turned out to be significant factors influencing the Goldcrest’s and the Firecrest’s densities. However, their importance was greater in the model based on remote sensing data only than where also detailed information about forest structure was involved. A recent study indicated the importance of climatic variables in the multiple regression model with respect to both species’ abundance (Kuczyński and Chylarecki 2012; Kralj et al. 2013). However, it should be noted that the above study was based on data from a wide longitudinal gradient (from central Spain to eastern Poland) (Kralj et al. 2013). In our study, the influence of latitude, altitude and weather conditions on Goldcrests’ and Firecrests’ densities is most probably indirect. The spruce and the fir prefer relatively cold and wet climate, therefore their geographical ranges in Poland are limited to the southern, mountainous and upland regions (both species) and north-eastern parts (spruce) where hemiboreal coniferous forests occur naturally. Low humidity in early spring is a limiting factor for the beech, which is present in the mountains and in the west of Poland, but does not occur further in the east where the climate is dryer (Zając and Zając 2001). The abundance of both studied Regulus species overlaps with the distribution of preferred trees, meaning that these small scale habitat variables are among the most important factors influencing their densities (all RI = 1).

We also found that the high density of the Goldcrest was likely to determine the high density of the Firecrest in areas of their coexistence, however, the opposite relationship was not established, i.e. the high abundance of the Firecrest did not always indicate the high abundance of the Goldcrest. The coexistence of species is possible only if their niches are not identical (Krebs 2009). Morellli and Tryjanowski (2014) point out that this relationship may be just a local phenomenon. Still, distribution ranges of both species overlap across a large area of sympatry in central Europe. Therefore, we suspect that this relationship reflects a general pattern within the European distribution range. Furthermore, from the behavioural point of view it is possible that the Firecrest uses the Goldcrest’s presence as a clue for a suitable habitat, as is the case with the Pied Flycatcher (Ficedula hypoleuca) and the Titmice (Parus spp) (Forsman et al. 2002).

In turns out that separate models based on the open source remote sensing data and/or forest inventory data derived from ground measurements can only partially explain the distribution of the Goldcrest and the Firecrest. In our case habitat (from both sources) and climate variables as well as topography explain spatial variation of both species but do not provide a complex explanation of all factors driving the species’ habitat relationships. It is highly possible that the integration different land-use types (here: small and large scale habitat data) reflects landscape complexity—one of the key elements leading to high predictive performance (McPherson and Jetz 2007). On the other hand, low predictive performance of models based only on remote sensing data can be the consequence of a non-stationary phenomenon showing different responses to the same predictor in different geographical areas (Fink et al. 2010). If Poland’s territory was divided into smaller geographic sub-regions based on biogeographic zones, for example, and separate species distribution models based on remote sensing data (or only small scale data) were constructed for these sub-regions and then averaged, predictive performance could turn out to be higher than in our approach.

Conclusion

Separate SDMs based on remote sensing data and/or detailed information on forests derived from ground measurements do not contain all important ecological factors that can affect species’ distribution on a large geographical scale. From a conservationist’s point of view, remote sensing data cannot replace the painstaking field study of habitat requirements for individual species (Storch et al. 2003), but it can be stated that the implementation of small scale habitat variables in SDMs can reduce overall costs and improve the efficiency of field studies since it allows to identify concentrations of high species densities with much higher accuracy.

In our opinion, habitat modelling for practical spatial planning and conservation purposes should not be based entirely on free data sources, but it also should incorporate as detailed data as possible. In case of forest areas, the best available source of data is the forest inventory database. Although designed for commercial purposes, namely inventory of wood resources, it turns out to be useful for studies on conservation.