1 Introduction

Predicting species distributions on the basis of species distribution modelling (SDM), often referred to as “predictive mapping”, has become a tool which is widely used not only in macroecological studies, research on niche evolution, interactions and co-evolution of species, but also in planning conservation areas (Warren et al. 2008; Esselman and Allan 2011; Drew et al. 2011; Fourcade et al. 2017). So, it comes as no surprise that the methodological aspect of developing models has been much discussed in the latest scientific literature (Fourcade et al. 2017). From the analytical point of view, the SDM is a group of statistical tools based on ecological niches theory, which describe non-linear relationships between species occurrence/density/richness and environmental layers in order to build statistical models which reflect processes that drive species distribution on a large geographical scale (Guisan and Zimmermann 2000; Elith and Leathwick 2009; Franklin 2010). Such models, projected in space or time, can predict not only species occurrence and density, range shifts under climate and habitat changes (Hijmans and Graham 2006), but also evaluate biodiversity surrogates and estimate invasive species distribution (Jimenez-Valverde et al. 2011). Thus, if the SDM has become a standard tool in macroecological studies (Guisan et al. 2013), it is vital to ensure the uniqueness of the process of machine learning methods both for species occurrence and other population and community parameters.

In recent years the number and complexity of statistical tools used to predict species distribution has grown exponentially (Hegel et al. 2010; Lobo et al. 2010). However, having analysed research results obtained with these tools, one comes to a conclusion that when varied analytical methods are not included in the SDM predictive process, the ecological value of results might be limited (Elith and Graham 2009; Elith et al. 2006; Segurado and Araújo 2004; Guisán and Theurillat 2000; Mattsson et al. 2013). For example, predictive occurrence of the European grayling (Thymallus thymallus L.) was provided by a number of different predictors coming from different models (Fukuda et al. 2013).

Therefore, we decided to compare the performance of two methods, i.e. the Generalised Additive Model (GAM) and the Random Forest (RF) to predict density of two bird species as well as two measures of bird biodiversity. From a wide spectrum of SDM methods, the GAM and the RF are considered to be among the most powerful machine learning algorithms used to predict species distribution. However, their effectiveness in projecting different levels of ecological systems, such as density and species richness, has not been fully confirmed (Araújo and New 2007; Hardy et al. 2011; Heikkinen et al. 2012; Elith et al. 2006; Wisz et al. 2008; Williams et al. 2009; Lei et al. 2011). Both algorithms can handle a large number of predictors and identify which of them are important via the out-of-bag procedure for the RF and the Relative Importance (RI) for the GAM (Breiman 2001; Hastie and Tibshirani 1990; Hastie et al. 2008). However, the output of both models can vary, because the RF creates a lot of specific models based on randomly perturbed data (Grimm et al. 2008; Lawler et al. 2006; Mutanga et al. 2012; Prasad et al. 2006; Virkkala et al. 2010), whereas the GAM constructs one “best model” (Breiman 2001, Hastie and Tibshirani 1990; Hastie et al. 2008). So, both models can detect the influence of the same predictor, but with a varying power, which in turn may suggest that relationships between predictors and species distribution may partly reflect a spatial structure of predictors, especially climate and topography, rather than a real ecological process (Bahn and Mcgill 2007; Beale et al. 2008; Chapman 2010). Furthermore, in the ecological modelling approach the problem with comparing models with one another increases along with spatial complexity (Hof et al. 2012; Carrasco et al. 2014; Seppelt and Voinov 2002; Kosicki 2018). When a lot of predictors affect species density and/or species richness in different ways (positive/negative and/or linear/non-linear), and models’ output is transferred to new areas, more rigorous tests are indispensable. For that reason, an independent test dataset should be used to evaluate the quality of the GAM and the RF so as to obtain more realistic models (Fourcade 2016). Therefore, the proposed approach should indicate whether the GAM or the RF can or cannot reveal environmental drivers underlying species distribution (Petitpierre et al. 2017). Otherwise speaking, the aim of this study is to test a hypothesis that the choice of machine learning methods is crucial for ensuring the high quality of SDMs.

Consequently, spatial distribution models of farmland (Great Grey Shrike) and forest (Great Spotted Woodpecker) breeding bird species’ densities were developed along with an analysis of spatial distributions of taxonomic (expressed as the number of bird species) and functional species richness (expressed as species occurrence linked with the ecosystem’s functioning and environmental components (Hillebrand and Matthiessen 2009; Cernansky 2017). According to a small spatial scale study, the Great Grey Shrike prefers large areas of heterogeneous farmland, where arable fields are interspersed with meadows and small forests (Antczak et al. 2004), while the Great Spotted Woodpecker favours areas dominated by large areas of all forest types (Kosiński et al. 2006, 2018). Taxonomic species richness is associated with a habitat’s arrangement, however, functional species richness is often linked with an interplay between habitat, topography and environmental conditions (Kosicki and Hromada 2018). By using two different modelling approaches, we determine whether a given procedure produces variables which are incidentally or indirectly linked to species distribution on a large spatial scale (Dormann et al. 2012) or whether the selection of relevant predictors is compatible with the species’ ecology (Petitpierre et al. 2017). Thus, a pragmatic question arises: What type of models should be used to predict density and species richness? In other words, this study aims to determine the extent to which GAM or RF models produce similar spatial predictions.

2 Materials and methods

2.1 Bird data

Bird density and the number of bird species were derived from the Common Breeding Birds Monitoring Scheme (Chylarecki and Jawińska 2007). This database is based on field data collected in Poland in years 2000–2013 in 970 1 km2 grid cells, which were randomly chosen out of 312,679 km2 squares covering the whole of Poland (Fig. 1).

Fig. 1
figure 1

Distribution of census grid cells according to the Common Breeding Birds Monitoring Scheme

In each breeding season each grid cell was surveyed twice: between 10 April and 15 May, and between 16 May and 30 June. Each survey started between the dawn and 9 a.m. and lasted about 90 min. Ornithologists-volunteers noted birds perpendicular to two parallel 1-km transects along an east–west or north–south axis, and recorded them in three distance categories (< 25 m, 25–100 m, > 100 m). During the fourteen-year span each square was inspected on average 6.5 times (SD = 3.9). As the number of individuals in a given grid cell we used the highest recorded number of individuals during either of the two inspections in each season. Then, these values were averaged out for all the years of study.

2.2 Environmental data source

For modelling purposes we used various environmental predictors related to topography, climatic conditions, land cover, vegetation indices, forest stand structure and landscape configuration metrics (see details: Appendix A). All these data came from different GIS databases, and all of them were converted into the GRASS GIS file format (GRASS Development Team 2015) with the grid size of 1 km2 (corresponding to bird survey plots) and re-projected to the co-ordinate system EPSG4284 projection (http://spatialreference.org/ref/epsg/4284/).

2.3 Data processing and analysis

Bird density in grid cells depended on the transect’s length and the distance from the observer (Krebs 1999). Therefore, the Hayne estimator of bird density was used according to the equation below (Hayne 1949; Krebs 1999):

$$ \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{D}_{H} = \frac{n}{2L}\left( {\frac{1}{n}\sum {\frac{1}{{r_{i} }}} } \right) $$

where, H = Hayne’s estimator of density, n = number of animals seen, L = length of transect, ri = sighting distance to each animal i.

The measure of avian biodiversity comprised (1) Bird species richness (BSR), expressed as the number of bird species noted in a given grid cell, and (2) Functional richness (FRic), representing the amount of functional space occupied by species in grid cells. A low value of a component shows a low number of species whose niches (functional space) do not overlap, while a high value indicates a high number of species within a given functional space whose niches overlap considerably due to the fact that particular species can use the same resources in the same way without limiting one another. Functional diversity was calculated with avian niche traits (Pearman et al. 2014) based on 52 variables that described the niche of each bird species, i.e. food types (13 variables), behaviour while acquiring food (9 variables), substrate from which food was taken (9 variables), period of the day when a species foraged actively (3 variables), and nesting habitats (18 variables). All variables were binomial 0 or 1 (Pearman et al. 2014). The calculation was performed using FD library for R (Laliberté et al. 2015).

In order to avoid multicollinearity among particular predictors, the Principal Components Analysis (PCA) was employed with the Varimax normalised rotation for two (climatic and habitat) environmental datasets (Dormann et al. 2012) (Appendix B, Table S1, S2). We used two separate PCAs for two data types to enhance the interpretation of new loadings. The PCA of climate variables produced two axes (Appendix B, Table S1) which explained 83.7% of the original variation in climate variables. Habitat variables produced eleven components (Appendix B, Table S2) and explained 85.4% of the variation.

The first axis (CFIELD-CCFOR) represents a gradient from the core of non-irrigated arable land to the core of coniferous forest. The second axis (MFIELD-CCFOR) is related to a gradient from mixed arable fields and the buffer zone of non-irrigated arable land to the core of coniferous forest. The third axis (NOVEGETAT) has the highest loadings on areas without vegetation. The fourth axis (MEAD-CFIELD) corresponds to a gradient from meadows to the core of non-irrigated arable land. The fifth axis (CDFOR-CFIELD) shows a gradient from the core of deciduous forest to the core of non-irrigated arable land. The low value of the sixth component (FRUIT) points to areas with fruit trees, while the low value of the seventh component (PEAT) reflects peats. Then, the low value of the eighth component (MARSH) reflects marshland. The ninth component (CFIELD-BBUSHE) relates to the core of non-irrigated arable land with a low value, while the high one shows the core of bushes. The tenth axis (MFOR-CBFIELD) is a gradient from mixed forest to the core of non-irrigated arable land. Finally, the low value of the eleventh axis (URB-BCFOR) shows urban areas, while a high value reflects the buffer zone of coniferous forest.

Furthermore, Pearson correlation coefficients and the Variance Inflation Factor (VIF) were applied to assess relationships between predictors which came from two separate PCA analyses, as well as other predictors not included in the PCA, i.e. topography measurements, Normalized Difference Vegetation Index (NDVI) and the shape of patches (Dormann et al. 2012) (Appendix B, Table S3). In the present study the VIF ranged from 1.1 to 22.7, so following Naimi et al. (2014) predictors with VIF ≥ 10 were excluded from the analysis.

2.4 Generalised Additive Models (GAM)

The GAM (Hastie and Tibshirani 1990), (mgcv library in R; Wood 2013; R Development Core Team 2017) is an extension of the Generalised Linear Modelling (GLM) and provides a flexible non-linear relationship between the response variable and the predictor (Austin and Meyers 1996; Guisan et al. 2002). Response variables include (1) the Hayne estimator of the Great Grey Shrike, (2) the Hayne estimator of the Great Spotted Woodpecker, (3) bird species richness, and (4) functional species richness. The most parsimonious model was selected with the Akaike information criterion with the lowest AIC and consequently the highest Akaike weight (Burnham and Anderson 2002). All possible models using the ‘dredge’ procedure, i.e. 2n where n = number of variables, were analysed with the MuMIn library in R (Bartoń 2013; Hastie and Tibshirani 1990).

The probability of including a variable in the best parsimonious model was estimated as the Relative Importance (RI) by summing the Akaike weights of all candidate models in which the variable was included (Burnham and Anderson 2002; Reino et al. 2010). Moreover, the Gamma distribution for modelling density was used, while for species richness we used the Poisson distribution. To allow some complexity in the functions, while avoiding data over-fitting, the basic dimension was defined as k = 4 (Santana et al. 2012). We used R2, expressed as percentage of the explained variance which is a quality measure of the model’s fit to the training data (Weisberg 1980). Finally, as the measure for the best model we applied: (1) evidence ratio (ER), defined as Akaike ω1/ω2 (Burnham and Anderson 2002), and (2) minimised generalised cross-validation (GCV) score which showed measure smoothness. Smaller values of GCV indicated better models fitting (Wood 2006).

2.5 Random Forest approach

The RF (Breiman 2001; RandomForest library in R Liaw and Wiener 2002; R Development Core Team 2017) was developed in a regression setting, in which—similarly to the GAM—the Hayne estimator of two species density and two avian biodiversity measures were treated as the model’s response. This method is a machine learning technique (Breiman 2001; Elith et al. 2008) linked to classification/regression trees (CART) and bagging methods (Breinman et al. 1984; Breinman 1996; Breiman 2001). Thus, instead of constructing a single tree (CART), it constructs multiple trees formed on random samples of cases by bootstrapping technique, i.e. sampling with replacement, and each split (in each tree) from the random sample of predictors. This procedure ensures that a 1/3 of the observations is left out in the fitting process and used to calculate the prediction error, i.e. out-of-bag (OOB). We examined also the RF model’s performance as Root Mean Square Errors (RMSE) and calculated the percentage of the explained variance (R2).

The RMSE and R2 measured the model’s performance when predicting population density and/or measurement of species richness in the studied grid cells. Both measures were estimated on the basis of the training dataset. On the other hand, the OOB measured the performance of predicted densities for the whole country, and it was estimated according to the test dataset. The RF procedure also provided (1) a ranking of predictors’ importance based on specified changes of mean decrease in accuracy (MSEOOB), i.e. predictors were removed one by one from the model (Berk 2008; Breiman 2001), thus revealing their adequacy; and (2) partial dependence plots which showed how each predictor was related to the response variable when other predictors were held constant (Berk 2008). Then, a selection of variables (Guyon and Elisseeff 2003) was performed with the VSURF library for R (Robin et al. 2010), which made it possible to identify the smallest number of predictors that offered the best predictive power for the final model (Kohavi and John 1997). The search function commenced with all environmental variables (n = 27), and then progressively eliminated the least promising ones. Finally, a subset of environmental parameters with the lowest RMSE was selected (Kosicki 2017).

The RF regression required defining three parameters, such as ntree (number of trees), mtry, and the nodesize. The lowest RMSE was for ntree = 1000, mtry = 1/3 of variables (default value), and the nodesize = 5 (default value) (Vincenzi et al. 2011).

2.6 Generalised Additive Models vs. Random Forest

In order to compare two different results coming from the two methods, an independent dataset was used. Before developing GAM and RF models, 20% of the observations that had not been included in the modelling procedure were randomly chosen. For model validation, we used a correlation coefficient between predictive (from the best GAM and/or RF model) and real species densities and richness measurements (20% of observations were not included in the model fitting). Thus, the higher the value of the correlation coefficient, the higher the predictive power of the model.

2.7 Spatial autocorrelation

Residuals coming from best GAM and RF models were checked for their spatial autocorrelation by using the Moran’s I statistics (Library: ape for R, Paradis 2016). The Moran’s index ranged from − 1 (negative spatial autocorrelation) to 1 (positive spatial autocorrelation), with non-significant values close to zero.

3 Results

3.1 Population size

The mean number of the Great Grey Shrike (number of birds in each grid cell) was 1.62 individuals/km2 (95% CL 1.54–1.70, n = 970), while the mean Hayne estimator of density (ĎGGS) was 0.36 individuals/km2 (95% CL 0.29–0.43). On the other hand, the mean number of the Great Spotted Woodpecker was 0.54 individuals/km2 (95% CL 0.49–0.59, n = 970), while the mean Hayne estimator of its density (ĎGSW) was 1.66 individuals/km2 (95% CL 1.60–1.72). Finally, the mean value of taxonomic species richness (BSR) was 58.40 (95% CL 57.26–59.54), while functional species richness (FRic) amounted to 0.12 (95% CL 0.11–0.12).

3.2 Models for the Great Grey Shrike \( (\check{\mathsf{D}}_{\mathsf{GGS}}) \)

Out of all analysed Generalised Additive Models (GAM), only five gained support using information-theoretic criteria, showing AIC weights > 0 (Appendix A, Table S4, model GGS2). The most parsimonious model (Table 1) included 7 environmental variables with the Relative Importance RI > 0 (Appendix A, Table S5, Fig. S1). According to this model, a predictive map of spatial density distribution was created (Fig. 2a). In turn, in the Random Forest (RF) approach, the model with selected predictors based on MSEOOB ≥ 14.0 explained a higher percentage of variation than the full model (Appendix A, Table S6). The best, i.e. the selected model (Table 1), included 7 predictors (Appendix A, Table S5, Fig. S2). Similarly to the most parsimonious model for the GAM, a predictive map of density distribution was also developed for the best RF model (Fig. 2b). According to the model evaluation procedure (Table 1, Figs. 3a, b, Fig. 4a, b), the RF model had higher predictive power than the GAM.

Table 1 Comparison of GAM and RF predictive effectiveness
Fig. 2
figure 2

Predictive maps of the Hayne Estimator of Great Grey Shrike density (a based on GAM), (b based on RF); the Hayne Estimator of Great Spotted Woodpecker (c based on GAM, d based on RF), bird species richness (e based on GAM, f based on RF), functional species richness (g based on GAM, h based on RF)

Fig. 3
figure 3

Correlation between real density/species richness and predictive density/richness

Fig. 4
figure 4

Residuals’ distribution for the best supported GAM and the RF. a GAM for Great Grey Shrike density, b RF for Great Grey Shrike density, c GAM for Great Spotted Woodpecker, d RF for Great Spotted Woodpecker, e GAM for bird species richness, f RF for bird species richness, g GAM for functional species richness, h RF functional species richness

3.3 Models for the Great Spotted Woodpecker \( (\check{\mathsf{D}}_{\mathsf{GSW}}) \)

In this case five GAMs showed AIC weights > 0 (Appendix A, Table S4, model GSW2). Model selection procedures resulted in 6 predictors with the Relative Importance RI > 0, but only 5 were included in the best-supported model (Appendix A, Table S5, Fig. S3). On the other hand, the RF model included 8 predictors based on MSEOOB ≥ 22.0, and it was better than the full model (Appendix A, Table S6, Fig. S4). Following the best RF and GAM, two predictive maps for the \( \hat{D} \)GSW were developed (Fig. 2c, d). However, the model evaluation procedure showed that—contrary to the Great Grey Shrike—the GAM had higher predictive power for the Great Spotted Woodpecker’s predictive density than the RF (Table 1, Figs. 3c, d, 4c, d).

3.4 Models for Bird Species Richness (BSR)

Three GAMs gained support in BSR models using information-theoretic criteria with AIC weights > 0 (Appendix A, Table S4, model 2BSR). The most parsimonious model included five predictors, but six variables showed RI > 0 (Appendix A, Table S5, Fig. 5S). In the RF approach, the selected model with 7 predictors based on MSEOOB ≥ 9.0 had a relatively higher importance than the full model (Appendix A, Table S6). According to best GAM and RF models, we created predictive maps of spatial BSR distribution (Fig. 2e, f). Having evaluated the models (Table 1), we reached a conclusion that in this case the GAM had higher predictive power than the RF (Table 1, Figs. 3e, f, 4e, f).

3.5 Models for functional species richness (FRic)

For the second measurement of species richness, i.e. FRic, four GAMs showed AIC weights > 0 (Appendix A, Table S4, model FRic2). The most parsimonious model included six predictors (Appendix A, Table S5, Fig. S7), and on its basis we created a predictive map (Fig. 2g). On the other hand, the selection procedure for the RF method showed that the best model included 9 predictors (Appendix A, Table S6, Fig. S8), all of which had MSEOOB ≥ 9.0 (Appendix A, Table S5). Similarly to the best GAM, we developed a predictive map based on the best RF model (Fig. 2h). Here the evaluation procedure (Table 1, Figs. 3g, h, 4g, h) showed that the RF had higher predictive power than the GAM.

3.6 Spatial autocorrelation

In all cases the Moran’s I statistics for residuals coming from each model were very low. We found statistically significant positive autocorrelation for the GAM and the RF model of the Great Grey Shrike density (M = 0.021, p = 0.02 and M = 0.011, p = 0.04, respectively) and bird species richness (M = 0.014, p = 0.04 and M = 0.033, p = 0.03, respectively). Residuals from the Great Spotted Woodpecker models indicated spatial autocorrelation only for the GAM (M = 0.012, p = 0.05), while for the RF we found no statistical significance in the Moran’s I statistics (M = 0.009, p = 0.06). Finally, for the FRIc we found opposite results, i.e. there was no spatial autocorrelation in the GAM (M = 0.007, p = 0.07), while in the RF the positive spatial effect was low (M = 0.019, p = 0.03).

4 Discussion

The study supports a hypothesis that the choice of a machine learning modelling technique may influence predictive accuracy of bird density as well as species richness. Apart from the fact that in both variants of the models relationships between environmental variables and response functions were both linear and non-linear and also similar, the Random Forest (RF) detected a larger number of important predictors than the Generalised Additive Model (GAM). Consequently, in each case predictive maps were qualitatively different. This result corresponds to earlier studies based on occurrence models, where different predictive models based on machine learning methods often detected different numbers of important predictors (Pourtaghi et al. 2016). On the one hand, it was proved that the RF approach often had the lowest prediction error (Prasad et al. 2006; Cutler et al. 2007; Syphard and Franklin 2009; Vorpahl et al. 2012; Oliveira et al. 2012), but on the other, the effectiveness of the GAM had also been repeatedly confirmed (Ferguson et al. 2006; Vilchis et al. 2006; Becker et al. 2010; Mannocci and Catalogna 2014; Mannocci and Laren 2014). Our results are consistent with both of the above conclusions, because we found that the GAM was the best method for the Great Spotted Woodpecker density and bird species richness, whereas the RF had the highest predictive power for the Great Grey Shrike and functional species richness. So, we put forth an important question: What drives the differences between the two kinds of models? At first, we assumed that differences between models resulted from inherent spatial autocorrelation of species distributions, which could be driven in different ways in different models (Lichstein et al. 2002). However, the results from the Moran’s I statistics obtained in this study (in all cases) were low (range − 0.040 to 0.021), while α-probabilities, i.e. the probability of the study to reject the null hypothesis, were relatively high, leading to a conclusion that spatial autocorrelation did not affect particular models. Thus, we conjectured that differences between models’ predictive power resulted from different ecology of the studied species, which could be described by various methods of analysis. Here the results confirm such a speculation, because in this study the effectiveness of a given method depends on the number of predictors (environmental components) that affect density and/or species richness. The highest density of the Great Spotted Woodpecker was observed in the core of all forest types (deciduous, mixed and coniferous), while other habitat components, e.g. aspect, ndvi-jun, altitude, temperature, played only a marginal role. Correspondingly, taxonomic bird species richness was associated with the ecotone between open and forest habitats, while other environmental elements, e.g. ndvi-jun, precipitation, shape of forest, only slightly improved the models. However, the Great Grey Shrike and functional species richness responded positively to more detailed elements of the environment. In both cases the most important components included the complicated shape of open habitats, the topographical gradient and the level of green vegetation. Both the density of the Great Grey Shrike and FRic are associated with mixed arable fields, where small forests are interspersed with small arable fields and meadows with a high level of green vegetation at the beginning of spring as well as a high roughness index. Additionally, FRic depends also on habitat and topography configuration metrics, i.e. relationships between the core and the buffer zone in the preferred habitats. Therefore, we conclude that the GAM is the most suitable method for species density and measures of species richness that require habitat requirements with sharp boundaries in the landscape (specialist species), i.e. the Great Spotted Woodpecker, and taxonomic species richness (increased efficiency as compared to the RF by 27% and 42%, respectively), while the RF method has the lowest prediction error for functional species richness and more generalist species, such as the Great Grey Shrike (increased efficiency as compared to the GAM by 34% and 30%, respectively). Hence, the results of this study confirm earlier suggestions (Oppel et al. 2012) that different analytical assumptions of models combined with species’ specific ecology do influence the predictive power of machine learning methods.

Both methods are very useful for modelling many kinds of ecological phenomena, because they do not need assumptions about the distribution of predictors, and they allow for a mixed use of categorical and numerical variables. They are also capable of considering nonlinearities between variables (Becker et al. 2010; Hastie and Tibshirani 1990; Breiman 2001). Nevertheless, from the biological point of view, the biggest differences between these methods concern the assessment of predictors’ importance as well as the development of partial models which are subsequently included in the final model.

The GAM produces one “best” model based on forward or backward selection independent variables, where particular predictors are linearised or included in the model as a polynomial spline (Hastie and Tibshirani 1990). The evaluation of the model and the estimation of predictors’ importance are based on information-theoretic criterion, i.e. the AIC and relative importance (RI, Burnham and Anderson 2002). Although this approach can be criticised (Whittingham et al. 2005; Reino et al. 2010), an analysis of all possible models (based on a dredge procedure in mgcv library for R) is often used when there is not enough a priori information to develop a small set of models (Reino et al. 2010) or it can be useful in exploratory analyses or even when ecologists have vague ideas how a functional form relates explanatory variables to response variables (Pourtaghi et al. 2016). However, in this case the GAM underestimates densities of generalist species and functional species richness by underrating structures related to landscape configuration, such as topography, vegetation, geometry of patches, just as it had been previously noted for specialist species (Kosicki 2017). Therefore, a plausible conclusion is that it would be a better solution in the presented predictive perspective to average models with weights > 0 rather than choose one most parsimonious models (Grenouillet et al. 2010). When models are averaged, predictors, such as elevation, ndvi-may and/or precipitation, detect in the second and third candidate models partial influence of the response function, which is then indirectly reflected in the predictive map. However, the process of averaging many GAMs comes at a cost, because when there is a plethora of predictors an average model is usually difficult to interpret.

On the other hand, instead of the “best model” the RF constructs multiple models, formed on random samples based on the bootstrap technique (sampling with replacement) (Breiman 2001). The assessment of predictive importance is not based on the information-theoretic criterion, but on specified changes mean decrease (MSEOOB) in accuracy where one predictor is removed from the model (Berk 2008; Breiman 2001), a procedure which results in a ranking of predictors. Then, by using a backward elimination search function implemented in the VSURF library for R (Ismail and Mutanga 2010; Mutanga et al. 2012), it is possible to identify the minimum number of predictors that offer the best predictive accuracy. Such an approach helps simplify the modelling framework without the risk of reducing R2 (Guyon and Elisseeff 2003; Kohavi and John 1997). Thanks to this approach, we found only 8 environmental parameters out of total 48 predictors for the Great Grey Shrike and 11 for FRic, still achieving the best overall predictive accuracy. However, for the Great Spotted Woodpecker and taxonomic species richness, 9 and 6 independent variables, respectively, were taken into account, and they displayed moderate predictive performance when compared with the GAM.

It should be noted that we also used an additional biological evaluation of models. Standard methods, such as the AUC or the TSS (Jimenez-Valverde 2014; Lobo et al. 2008; Smith 2013) are inadequate, because this type of evaluation can be useful for occurrence models rather than density/richness models. Besides, the AUC and/or the TSS fail to show biological significance of models (Fourcade et al. 2014; Stolar and Nielsen 2015). In this study the AIC for the GAM and the OOB for the RF provide information only about a model’s fit, but not statistics that would indicate biological reality of the model (Araújo et al. 2005). Therefore, before developing the models, 20% of the observations that would be left out had been randomly chosen and treated as an independent dataset. The correlation coefficient between real densities and densities estimated according to the results from the RF and the GAM procedures safeguarded this semi-independent assessment. Although this approach is debatable (James et al. 2013) as evaluation data is only a subset of the training data (Elith et al. 2006), this subset was randomly selected from 970 grid cells which had been inspected for 13 years, thus the data can be deemed unbiased. Finally, this simple procedure intuitively indicates the predictive model’s biological ability without going into complicated algorithms (Howard et al. 2014), such as Monte-Carlo validation, which is a robust estimation of a model’s performance, but very complicated to interpret (James et al. 2013).

5 Conclusion

The study shows that the effectiveness of different Species Distribution Modelling methods depends on the ecology of the studied species. Predominantly, the Generalised Additive Model is responsive to eurybiont (generalists) i.e. species with a broad tolerance range, whereas the Random Forest detects more subtle relationships between density and environmental variables, making it more suitable for stenobiont (specialists) species with a low tolerance range. Our study clearly indicates that effective modelling of bird species richness, as well as taxonomic and functional species richness on a large spatial scale is contingent upon the recognition of habitat preferences on a small and meso-geographical scale. The findings contribute significantly to applied ecology by showing that the predictive power of several bird species’ models depends on an ecological complexity of systems, including a complex interplay between many environmental components that can only be captured by means of diversified modelling techniques.