1 Introduction

In addition to honey production, honey bees such as Apis mellifera L. in Europe are essential to support biodiversity as they are the most important pollinators of multiple plant species. As much as 80% of all pollination is attributed to honey bee activity [1]. Worldwide increased (winter) mortality rates are observed among honey bees as well as solitary bees [2,3,4,5]. In the past, high winter mortality rates were reported for France for the winter of 1999–2000 [2, 6], the USA [7, 8], and more recently Belgium [4] and China [9]. Winter mortality of complete colonies is generally attributed to a complex combination of biological and environmental conditions, and poor apiary management. Reported causes of winter mortality include the level of infestation with the Varroa mite [10,11,12], the connectivity of the natural landscape [13], the use of plant protection products [14,15,16,17,18,19,20], and beekeeper experience and practices [5, 21].

Winter mortality rates are a key indicator for the weakness of bee colonies [2, 4, 6, 21–23] with mortality rates of 10% being considered acceptable [3]. In Belgium the reported average bee mortality rates for the winters 2008–2009 and 2009–2010 were well above this threshold with 18% and 26% respectively [21]. More recently, a feasibility study was carried out by some of the authors for the region of Flanders to examine the significance of potential causes for bee winter mortality by means of linear regression [22]. The methodology was considered useful although the number of apiaries was insufficient to draw definitive conclusions on the main causes of bee winter mortality. In the follow-up study called BeeHappy-Wallonia, the analysis shifted to the region of Wallonia [24], applying binomial instead of linear regression. This was considered more appropriate in view of the binomial nature of the sampled mortality rates for the colonies. Compared with the region of Flanders, the southern region of Wallonia is less urbanised with a higher proportion of agricultural and natural land use. This can be expected to affect the winter mortality rates, as honey bees are affected by the quality of the landscape and food availability.

The analysis presented in this manuscript is related to the study for the region of Wallonia and was organised in three steps. First, spatially differentiated data for the winter mortality rates and potential causes, here referred to as explanatory variables, were identified. All data were collected and spatially allocated on a 1-ha raster grid, using GIS analysis. Next, the winter mortality rates and raw data for the explanatory variables were subjected to an iterative binomial regression analysis with a stepwise model design including non-linearities and interactions between terms, using the Akaike information criterion for model selection [25]. Finally, the resulting regression models were evaluated on the presence of spatial autocorrelation and overdispersion. The data sampling, regression modelling, and results are discussed in the following sections.

2 Methods

2.1 Data Sampling of Winter Mortality Rates

The winter mortality rates were obtained for the region of Wallonia from the EU-funded EPILOBEE project [3]. Belgium was one of the 17 EU member states participating in the project, with the Federal Agency for the Safety of the Food Chain (FASFC) as responsible administration. The main aim was to get an overview of honey bee colony losses on a harmonised basis in each of the participating EU countries, by collecting data for the two consecutive winters of 2012–2013 and 2013–2014. The surveillance protocol was based on common guidelines for honey bee health provided by the European Reference Laboratory ANSES [26]. In Belgium on average 15 apiaries were selected in each of the 10 provinces out of the total number of 3000 apiaries registered by the FASFC in 2012 [22]. The number of registered apiaries in 2012 represents less than half of the total number of apiaries in Belgium. The data were collected with a uniform spatial distribution over Belgium (Fig. 1). For Wallonia a total of 300 colonies were sampled in 74 apiaries for the winter 2012–2013 and 307 colonies in 73 apiaries for the winter 2013–2014 [3], with a sample size between 1 and 6 colonies. The average size of these samples was considered sufficient to derive reliable estimates for the mortality rate at the apiary level. This analysis at the apiary instead of the colony level is inspired by the level of accuracy for the description of the environmental conditions. Furthermore, the data on Varroa infestation could not systematically be linked to the mortality rates at the colony level due to limitations in the quality and quantity of the data.

Fig. 1
figure 1

Winter mortality rates for the sampled honey bee colonies in Belgium [3] for the winter 2012–2013 (left) and 2013–2014 (right), highlighting the region of Wallonia. Figure prepared in ArcGIS® version 10.1

The winter mortality rates for Belgium (Fig. 1) do not show a clear visible spatial pattern. Nevertheless, the average winter mortality rates were lower for the region of Wallonia, with 31% for the first winter and 9% for the second winter, based on the sample size per apiary and ratio of raw counts [22].

2.2 Potential Causes of Bee Winter Mortality

Data were collected for 26 potential causes of winter mortality, corresponding to 8 categories which were considered relevant: meteorological conditions [27], urbanisation [28], air quality [29, 30], electromagnetic radiation [31,32,33], the use of plant protection products [14,15,16, 18,19,20], food availability [34], pathogens and diseases [10,11,12, 23], and finally the profile of the apiarist (expertise and good practice) [5, 21].

A complete specification of all variables, data sources, and sampling protocols used in this study is found in Table 1. The available land use data allowed assessment of spatially differentiated environmental conditions that could have a negative impact on the foraging activities, such as the fragmentation of the open landscape. For some of the explanatory variables such as the meteorological variables, data were only available as point data for a limited number of locations. Spatial kriging interpolation using a spherical semivariogram was used here to translate data for individual weather stations into a map covering the region, ensuring data were available for all apiary locations with mortality rates.

Table 1 Definition and sampling protocol used in this study for the explanatory variables [24]

The geospatial interpolation of the meteorological variables requires further clarification. The variable Flying Hours (Fig. 2) was obtained for each weather station by multiplying the number of sunshine hours per day with the proportion of the day between sunrise and sunset with a temperature above 10 °C [22]. Next, geospatial kriging interpolation [35] was used to project the results on a 1-ha grid.

Fig. 2
figure 2

Flying Hours during flying season in Belgium for the year 2012. Figure prepared in ArcGIS® version 10.1

An identical procedure was used to derive the geospatial distributions for the variables Frost Days and Brood Season Temperature [22]. The number of (critical) frost days is taken into account because, after a period of higher temperatures at the end of the winter, bees will increase the distance between each other in the hive. This results in less protection against a sudden cold period. The variable Brood Season Temperature refers to the winter bees which are needed for the colony survival. If the temperature remains too high during a prolonged summer, the transition to winter bees occurs too late. The average temperature of the period September–October is considered to be representative for the brood season. A distinct spatial distribution pattern could be observed for Frost Days and Brood Season Temperature as well as the other explanatory variables (Fig. 3).

Fig. 3
figure 3

Spatial patterns for selected explanatory variables for the region of Wallonia. a PM2.5 concentration (fine particulate matter). b NO2 concentration. c Urbanisation. d Telecom Tower Density. e Use of Neonicotinoids imidacloprid and thiamethoxam.f Use of glyphosate. g Landscape Connectivity, h Varroa Infestation. Figure prepared in ArcGIS® version 10.1

A honey bee typically forages over a distance of up to 1 km, with a maximum of 3 km around the beehive [46]. This behaviour can be represented by a circular, normalised distance decay mask around the beehive. For every explanatory variable, a distance decay function (Fig. 4), adopted from Hagler [47], was used to multiply within the foraging area the data with the distance weights for each pixel, followed by summation to obtain an aggregated value for the apiary. An exception was made for Food Availability and Plant Protection Products, which were averaged without distance decay mask because honey bees adapt their foraging pattern to the direction of food-rich locations [22, 47].

Fig. 4
figure 4

The distance decay mask applied to the circular foraging area. Figure prepared in MATLAB® version 2016a

The data for some apiaries were excluded from the analyses. Apiaries without data on the infestation with Varroa, known to be an important cause of bee mortality, were excluded from the analysis to avoid data inconsistency. Apiaries with colonies that were transported during the flying season were excluded as well because the apiary location did not correspond to the actual exposure of the bees. Hence, only 136 out of the 147 available apiaries were taken into account [24]. The datasets generated during this study are available on reasonable request.

2.3 Data Grouping

An important step in the analysis is to examine the presence of collinearity and to group the explanatory variables based on a significant positive or negative correlation, or functional similarity. The cross correlations of the explanatory variables are shown in Fig. 5.

Fig. 5
figure 5

Correlogram (136 observations) with correlation coefficients between − 1 (maximal negative correlation in red) and 1 (maximal positive correlation in blue)

The following conclusions were drawn from the correlation analysis:

  • a significant positive correlation is observed between most of the herbicides, insecticides, and fungicides. This can be attributed to the homogeneous distribution of these products over the agricultural parcels;

  • a significant positive correlation is observed between Urbanisation, NO2 and PM2.5, Telecom Tower Density, and glyphosate. The latter correlation can be explained by the intensive application of herbicides in urbanised areas;

  • strong negative correlations are observed of Food Availability, and to a lesser extent Landscape Connectivity with the herbicides, insecticides, and fungicides;

  • NO2 and PM2.5 are positively correlated with Brood Season Temperature and negatively with Food Availability and Landscape Connectivity.

The observed correlations between the different insecticides, herbicides, and fungicides point to a need for further grouping prior to the regression analysis. Correlations exist between some of the variables related to environmental conditions and apiary management, but these variables are difficult to group while excluding them from the analysis is not desirable. Therefore, two different datasets were created for the analysis:

  • Dataset A: raw dataset of all available explanatory variables related to the environmental conditions, Varroa Infestation, and beekeeping management, consisting of a total of 26 variables

  • Dataset B: grouped dataset consisting of 17 variables, with the following grouping and relabelling of the raw data of the plant protection products: fungicides (captan, mancozeb, propiconazole, tebuconazole, and thiram), herbicides (glyphosate), and insecticides (abamectin, alpha-cypermethrin, beta-cyfluthrin, cypermethrin, dimethoate, imidacloprid, spinosad, and thiamethoxam).

  • Both datasets were subjected to the regression modelling.

2.4 Regression Modelling

The bee winter mortality was analysed by means of logistic regression, using the logit canonical function

$$ \mathrm{Log}\left(\frac{Y}{1-Y}\right)={\beta}_0+{\beta}_1{x}_1+{\beta}_2{x}_2+\dots +{\beta}_N{X}_N $$
(1)

where Y is the winter mortality as a fraction of the total number of colonies in the apiary, xi are predictors based on linear or nonlinear functions of the explanatory variables and products of these functions, and βi the regression coefficients. Binomial regression is preferable over ordinary linear regression in case data are based on trials from samples of a limited sample size, in this case the number of dead colonies (between 0 and the maximum of 6 observed colonies per apiary). This sample size (number of colonies) in each apiary was taken into account in the binomial regression.

The adjusted Akaike information criterion or AICc [25, 47] is a useful, relative measure of the quality of statistical models for a given set of data:

$$ \mathrm{AICc}=\mathrm{AIC}+\frac{2k\left(k+1\right)}{N-k-1}=2k-2\log L+\frac{2k\left(k+1\right)}{N-k-1} $$
(2)

where k is the number of independents, N is the sample size, and L is the value of the likelihood function for the model. Models with a lower AICc value are preferred.

The general procedure for the regression analysis consisted of three distinct steps:

  1. 1.

    Application of four different functions (linear, squared, cubic, and natural logarithm) to the variables, followed by mean centering.

  2. 2.

    Stepwise binomial regression analysis of the mean-centred predictors, adding one regressor at a time, starting from the intercept model. The selection of the regressor is based on the maximal reduction of the AICc value.

  3. 3.

    Evaluation of the final model for quality of the model fit and the presence of overdispersion.

This procedure is executed twice: first considering predictors as main effects only, and a second time allowing for two-way interaction effects too. The stepwise model selection proceeds by adding regressors one by one to the model until the AICc index reaches a minimum and additional expanding the model leads to an increase due to the penalty effect and lack of improvement of the model fit.

The final model is validated by checking for overdispersion, using a dispersion index, defined by the ratio of the residual sum of squares (SSE) and number of degrees of freedom. Overdispersion is reflected by a value significantly larger than 1.

Spatial autocorrelation can be detected with the Global Moran Index (GMI) [48]:

$$ \mathrm{GMI}=\left(\frac{N}{W}\right)\times \left(\frac{\sum_{i=1}^N{\sum}_{j=1}^N{w}_{ij}\left({x}_i-\mu \right)\left({x}_j-\mu \right)}{\sum_{i=1}^N{\left({x}_i-\mu \right)}^2}\right) $$
(3)

where N is the total number of spatial locations with residual x, μ is the spatial mean of the residuals, wij is the weight assigned to the distance pair (xi, xj), and W the sum of all weights. The GMI falls in the range (− 1, 1). A value significantly different from 0 indicates spatial autocorrelation and would necessitate the statistical model to take this into account. Different weight models can be used for the GMI. For this study, an inverse distance model with a maximum range was applied to ensure at least one apiary was located within the range of the distance function. This maximum range was 47 km, well beyond the average foraging distance of a honey bee [46]. Application to the observed mortality rates resulted in a value of −0.055 for the region of Wallonia, confirming the absence of spatial autocorrelation. This ensures that no spatial autocorrelation terms should be taken into account in the regression model.

Finally, McFadden’s pseudo coefficient of determination [49], a common metric for binomial regression modelling, was used to verify the model fit with the actual mortality data:

$$ {R}_{\mathrm{MF}}^2=1-\frac{\log \left({L}_i\right)}{\log \left({L}_0\right)} $$
(4)

where Li is the likelihood for the actual model and L0 the likelihood for the intercept model.

3 Results

3.1 Regression Analysis

Table 2 gives an overview of the key regression metrics for the different regression models, based on the ungrouped (dataset A) and grouped (dataset B) data, comparing models without and with interactions included. Comparison based on the AICc index, pseudo R2, and dispersion shows that the regression models including interactions have a better score, without overdispersion and spatial autocorrelation of the residuals.

Table 2 Overview of the regression models

Figure 6 shows the stepwise reduction of the AICc index for the ungrouped dataset A and grouped dataset B with and without interactions included.

Fig. 6
figure 6

AICc index against the iteration step for dataset A (black) and dataset B (red). Predictors are indicated until interactions are added to the model. Figure prepared in MATLAB® version 2016a

Table 3 shows the predictor metrics for the grouped dataset B with interactions between the predictors. This regression model is proposed as the best model, retaining seven of the original seventeen explanatory variables. The grouping of the plant protection products into functional groups (Fig. 5) effectively reduced the impact of spatial autocorrelation while preserving a good model fit (Table 2).

Table 3 Regression metrics for dataset B (grouped) with interactions with mean-centred functions of the predictors: 1linear, 2squared, 3cubic, and 4natural logarithm

3.2 Model Evaluation

The chosen model based on the grouped dataset uses 7 explanatory variables: Varroa Infestation, Frost Days, Flying Hours, Fungicides, Beekeeper Practice, Landscape Connectivity, and Insecticides. The variables are ranked based on the order in which they are added to the regression model. The explanatory variable Varroa Infestation is clearly the most dominant predictor in the regression model because it is added as the first predictor and as the third predictor by means of its squared transformation.

Comparing the four different regression models, the following observations are potentially relevant for apiary and environmental management:

  • All models have the same beginning in the stepwise adding of regression terms (Fig. 6): Varroa Infestation, log(Frost Days), squared(Varroa Infestation), and log(Flying Hours);

  • From the fifth step onwards, different predictors are added for the two datasets. In both models, however, a plant protection product is added to the model: the Insecticide abamectin in the ungrouped dataset, and the group Fungicides in the grouped dataset;

  • In the chosen model, Beekeeper Practice and Landscape Connectivity are added in subsequent steps;

  • Looking at the order in which interactions are added for this model, the most significant interactions occur for Frost Days, Landscape Connectivity, Flying Hours, and Beekeeper Practice;

  • The explanatory variables Urbanisation, Telecom Tower Density, NO2, PM2.5, Brood Season Temperature, Food Availability, and Herbicides (glyphosate) do not occur in any of the regression models. Beekeeper Experience only appears in the ungrouped model with interaction terms.

Furthermore, two observations are made which are relevant from a modelling perspective:

  • Taking into consideration the order in which regression terms are added to the model, it appears that interactions are less crucial compared with non-linearities. Nevertheless, both have a clear beneficial effect on the AICc (Table 2) and should be included. In addition, the pseudo R2 increases from 0.39 to 0.57 for the grouped dataset (Table 2);

  • Grouping variables facilitates the interpretation of results but does not affect the ranking (order of being added to the regression model) of the important predictors.

By itself the AICc-based selection of predictors cannot rule out overdispersion. However, the regression analyses result in models with a reasonable model fit and dispersion index close to one (Table 2). This demonstrates the absence of overdispersion in the models.

4 Discussion and Conclusion

The European honey bee species (Apis mellifera L.) plays a key role for plant pollination in Europe, indirectly sustaining the production of food crops. The worldwide increase of bee winter mortality is a growing concern for the apiary sector, environmental protection agencies, agriculture, and society. Cross comparison with the environmental conditions and apiary management points to a large number of potential causes of bee winter mortality, the complex interaction of which is not yet fully understood. Laboratory experiments, model simulations, and colony-level sampling of apiaries are not able to explain the combined impact of all factors scientifically. Furthermore, spatially explicit analyses of the combined impact of natural and man-induced causes of honey bee winter mortality are still rare due to limitations in data availability and differences in the sampling protocols used for bee health.

Together, the EPILOBEE data and high-resolution environmental and landscape data for the region of Wallonia provided a unique opportunity to analyse the causes of winter mortality. Step-by-step construction of a generalised regression model combined with the adjusted Akaike information criterion proved very useful to identify and rank the main causes of bee winter mortality to focus apiary management. The analyses indicate the need to include both non-linearities and interactions in the regression modelling. Testing after each step shows that no overdispersion was detected when adding the individual terms to the regression model. For the grouped dataset, seven out of seventeen variables were retained: Varroa Infestation dominates as primary cause of bee winter mortality, followed in order of significance (adding to the model) by Frost Days, Flying Hours, Fungicides, Beekeeper Practice, Landscape Connectivity, and Insecticides.

The current study exploits the available data for the 136 observations on winter mortality in the region of Wallonia to the extent possible. Awaiting improvements in the quantity and quality of data, the approach followed can be useful for gaining a first understanding of the relative importance of environmental conditions, biological aspects, and apiary practices for bee winter mortality.

Varroa Infestation was the most dominant predictor in the regression models. Future data sampling campaigns of high quality and spatial resolution could help identify the main causes of the more and more frequent Varroa outbreaks. This information would be extremely useful for improving apiary management and systematic Varroa treatment. A recommended short-term strategy is to focus on this predominant variable while harmonising the monitoring of infection rates and improving the exchange of information and expertise between apiarists. The analyses with the raw data point to a need for further analysis of the role of the insecticide abamectin and the fungicide captan as potential harmful substances, as well as a potential importance of Flying Hours, Beekeeper Practice, and Landscape Connectivity as relevant manageable factors, and interactions of these variables. The current approach could help organise sampling programs in a more efficient manner, by focusing on the key indicators for each stressor group (meteorological conditions, environmental conditions, diseases, and apiary management).

Data on the use of plant protection products were available with limited spatial detail for the region of Wallonia. It would be worthwhile to have data available on the regional differences to improve the quality of the explanatory variable. Improvements should also include the extension of the monitoring to additional winters and seasons, accounting for the time dependency of climate, nutritional and other conditions, and colony-level sampling of mortality rates.

This study does not yet exploit the existing scientific understanding of the biology of the honey bee or known dependencies between specific variables. A hybrid approach, combining statistical analysis with scientific expertise or land use change models [50], could increase the predictive power and efficiency of the analysis. Finally, it would be interesting to apply the sampling strategy and geostatistical analysis to other social-environmental problems by using this spatially explicit approach. For example, it could be worthwhile to examine the dependency between public health and the combination of exposure to air pollution and use of plant protection products.