1 Introduction

Continuous monitoring of honey bee hives has been the subject of considerable attention from beekeepers, apidologists and researchers for over a century (Meikle and Holst 2015). Many different metabolic and environmental parameters have been considered to characterise the state and health of honey bee colonies. Such parameters include local CO2 gas concentrations to determine metabolic rate (Seeley 1974), temperature and humidity in the hive to investigate hive metabolic activity and homeostasis (Southwick and Moritz 1987; Human et al. 2006), hive weight to indicate colony size and food reserves (Meikle et al. 2008), vibration and sound to indicate swarming events (Ferrari et al. 2008) and bee foraging activity as a behavioural method to assess the effects of pesticide exposure (Pham-Delegue et al. 2002; Schneider et al. 2012).

Previous research has shown that bee species are affected by weather conditions in various ways. Bumble bees have been shown to preferentially collect pollen over nectar when the weather conditions at the foraging site are warm, dry and windy and to prefer nectar otherwise (Peat and Goulson 2005). Honey bees have ways of assessing the likelihood of rainfall in the future, evidenced by higher levels of foraging effort the day before heavy rainfall (He et al. 2016). Different bee species also prefer to forage at different temperatures (Vicens and Bosch 2000). Weather directly impacts hygro and thermodynamic processes within the pollinator, impacting survival rate and energy cost of foraging (Corbet 1990). Weather also affects pollinator activity indirectly, by altering the quantity and sugar concentration of nectar in flowers (Corbet 1990). A complete understanding of the factors influencing bee foraging effort has yet to emerge. It is, however, clear that weather assessment, in terms of the potential risk and reward of foraging, is an important aspect of bee sensory ecology. A better understanding of the relationship between bees and the weather could potentially help farmers identify and match suitable bee species to their crops, given latitude, flowering season and local climate.

Honey bee foraging rate can be readily measured as the rate of bee ingress and egress from the hive. This rate can hold information about the supply and demand for food in the colony (Mclellan 1977). The supply is determined by the amount of nectar and pollen that is available for bees to collect from flowers within their foraging range, as well as the amount of food stored in the hive as honey. Demand is dependent on colony size and age distribution (Mclellan 1977). To better document the influence of hive-extrinsic conditions on bee activity, data from hive monitoring equipment can be coupled with data from other sensors, such as those from a meteorological station (Burrill and Dietz 1981; Devillers et al. 2004).

A pioneering study (Burrill and Dietz 1981) investigated the response of bee colonies to two meteorological variables: temperature and solar radiation (SR). Bee activity was measured using electro-optical counters that recorded each ingress and egress from the hive. This study reported a positive correlation between temperature and bee activity. For SR, the data show a positive correlation up to a certain radiation threshold (0.66 Langleys or 460 W/m2) followed by a negative correlation for higher SR levels (> 460 W/m2). Although strictly correlative, this evidence strongly suggests that bee foraging effort is modulated by external weather conditions. Thus, bees, as individuals, must be capable of detecting and evaluating these atmospheric conditions.

With the proliferation of affordable microelectronics, several companies have brought bee hive activity monitoring to market: ApiScan® (Lowland Electronics, Leffinge, Belgium) came to market in the 1990s and offers remote counting of bee ingress and egress rates via electro-optical sensors at the hive entrance. More recently, Arnia® (Arnia Remote Hive Monitoring, United Kingdom) and HiveMind® (HiveMind Precision Agriculture, Christchurch, New Zealand) have integrated activity sensors with internal and external environmental sensors that provide remote data collection and at-a-glance analysis of the data to beekeepers. They purport to provide instant hive theft detection as well as information about the productivity and health of each of the user’s honey bee colonies.

Here, we present a method that captures hive activity, rain, solar radiation, temperature, humidity, wind speed and wind direction continuously at a time resolution of 1 sample/min. We attached a custom-made multichannel electro-optical bee counter to a national honey bee hive situated at a field site in North Somerset, UK (latitude: 51.4237; longitude: − 2.6711). We also deployed a weather station to record meteorological variables at the site of the hive. Both devices fed data into a central database for data storage and subsequent quantitative analysis (for full methodological detail, see supplementary online material). The aims of this study are to investigate the influence of weather on bee foraging activity and to explore the utility of predictive modelling for commercial honey bee health monitoring.

We present data from experimental periods covering two foraging seasons: July–September 2013 and June–September 2014, using a different honey bee colony for each. Each colony was housed in a standard National hive with one large brood box and one standard super for honey storage. These periods include several days reserved for hive maintenance (Varroa treatment and other beekeeping activities). Data from these days were omitted from the analysis. In total, there are 42 full days of data from 2013 and 74 full days from 2014. The omitted days were 25th July, 19th August and 17th September in 2013, and 1st August 2014.

We determined the correlation between each measured meteorological variable and bee egress rate (ER). We then fitted a generalised linear model (GLM) to the data. We assessed the power of our model to predict bee activity when given novel data, not used in the fitting of the model. Model predictive power was assessed by examining the coefficient of determination between predicted and observed values from the novel dataset. We discuss the possible application of predictive modelling for real-time health monitoring of honey bee colonies in the context of the above-mentioned automated bee monitoring services.

2 Results

2.1 Example data

Visual inspection of the data clearly shows periods of low solar radiation and temperature accompanied by low bee egress rate (ER). This can be illustrated with an example: Figure 1 shows bee activity, temperature and solar radiation recorded between 0600 and 2200 on June 7th, 2014. Sunrise on this day was at 0455, but there was significant cloud cover until shortly after 1100, as can be seen from the solar radiation data. Following the gradual increase in light level as the cloud cover subsides, the bees begin to leave the hive in larger numbers, as shown by the increase in ER. Another drop in ER can be observed around 1330, when cloud cover returns. ER increases again when these clouds subside between 1400 and 1430 (Figure 1).

Figure 1.
figure 1

Example day of data (7 June 2014) showing bee egress rate (blue), temperature (red) and solar radiation (orange, filled). This day began with cloud cover until shortly after 11 am. Immediately after the clouds cleared, indicated by an increase in solar radiation then temperature, the bees begin to leave the hive at an increased rate.

2.2 Initial analysis of correlation

For each day of data, we calculate the correlation between temperature and bee ER, and between solar radiation and bee ER, using either Spearman rank method or Pearson linear method. Using Spearman’s rank correlation, we find that R S (SolRad, ER) = 0.81 and R S (Temp, ER) = 0.83, indicating a very high positive correlation (Figure 2). For humidity, the correlation is negative, with R S (Hum, ER) =  − 0.74. Pearson’s linear correlation on the same dataset is similar to Spearman’s (R P  = 0.72, 0.78, − 0.75 respectively) indicating that the correlations are mostly linear. We find a correlation of > 0.6 (or <− 0.6) for the large majority of days. The other meteorological variables measured for this experiment were not well correlated with bee activity. Rainfall rate, atmospheric pressure and wind speed give correlations of R S (Rain, ER) =  − 0.06, R S (Pres, ER) =  − 0.08 and R S (Wind, ER) = 0.42.

Figure 2.
figure 2

Spearman’s rank correlation of each day of solar radiation data against bee egress rate R(S,ER) in yellow, temperature against bee egress rate R(T,ER) in orange and humidity against egress rate R(H,ER) in blue. Boxplot shows the median (line), interquartile range (box) and the top and bottom 1 percentile (whiskers). Outliers are marked with black crosses. Most fall above 0.6 (or below − 0.6), which indicates a good correlation. The median correlation is better than 0.8 for temperature and solar radiation and very close to − 0.8 for humidity.

The variables temperature, solar radiation and humidity are also well correlated with one another. R S (SolRad, Temp) = 0.75, R S (SolRad, Hum) =  − 0.63 and R S (Temp, Hum) =  − 0.82. Colinearity between predictor variables can lead to increased uncertainty when calculating regression coefficients. To measure this uncertainty, we calculated the variance inflation factor (VIF) associated with each variable. All predictor variables had a VIF of less than 3, below the acceptable maximum value, which varies in the literature between 10 (e.g. Kennedy 2003) and 4 (e.g. Pan and Jackson 2008). This means that the colinearity between the predictor variables does not significantly affect our confidence in the calculated model coefficients.

2.3 Model results

Before fitting our models, we split our dataset into two, with 90% of the samples placed in a training dataset and the remaining samples into a validation dataset. The training set is used to perform the model fitting. The resultant model is then evaluated using the values of the predictor variables in the validation set, and the predicted ER is compared to the observed values. We measured fit quality using the coefficient of determination (R2) between the predicted and observed values.

To be able to create a single model for multiple hives, we needed to correct for the difference in overall hive population between each hive or else ER for high population hives would be underestimated and low population hives would be overestimated (Figure 3a). Since the actual population is not known, a constant scaling parameter λ was calculated for each hive, equal to the average of the daily maximum ER across the entire dataset for that hive. After adding the scaling parameter, the error distribution for both hives is similar (Figure 3b). Including multiple hives helps to decouple the model fit from specific idiosyncrasies of individual hives (direction, shade, wind exposure etc.).

Figure 3.
figure 3

Model prediction errors for small and large hives, and size normalisation with scale factor lambda included as a predictor. a Distribution of errors in the predicted daily mean value of bee egress rate with temperature and solar radiation as predictors. ER is underestimated for the smaller colony (2013, red) and is overestimated for the large colony (2014, blue). b The inclusion of a scaling factor normalises both distributions around zero and allows for size-independent comparison.

We first created a model using just temperature and solar radiation as predictor variables. Individually, these variables yield models with R2 = 0.53 (temp) and R2 = 0.37 (SR). When they are used together as predictors, the model fit is greatly improved (R2 = 0.78). Figure 4a shows 8 days of ER data from the validation set (four from each hive), along with the predicted value from the model. The empirical cumulative probability distribution function (CPDF) of the prediction errors \( \left({y}_i-{\widehat{y}}_i\right) \) gives the probability that the prediction is accurate to within a certain number of counts (Figure 4b, c). The σ, 2σ and 3σ confidence bounds are also displayed to visualise the confidence of each prediction. When tested against validation data, the model’s predictions were within 10 counts per minute (cpm) of the measured value 65% of the time. It was within 39 cpm of the measured value 95% of the time (R2 = 0.78; σ = 10, 39, 76).

Figure 4.
figure 4

A generalised linear model was fitted to data using temperature, solar radiation and a scale factor as predictors, and bee egress rate as a response. a Measured (blue) and predicted (red) value for bee egress rate for 7 days of data that were not used in the fitting of the model. The grey lines show the 1σ confidence bounds of the prediction. b Histogram of errors in the predictions, showing an exponential distribution in both the original fitting data (blue) and the validation data (red). c Empirical cumulative probability distribution of the errors, showing a 65% chance that the prediction falls within 10 bees per minute of the measured result, and a 95% chance that it falls within 40cpm.

Figure 5 shows the same day of data given in the example above (July 7th, 2014, Figure 1) with the predicted egress rate shown against the measured one. To create this figure, the entire example day was kept as part of the validation set and was not used for fitting the model. Two of the key features of this day of data are the low bee activity until it increases sharply shortly after 11:00, and the distinct decrease around 14:00. Both features are clearly represented in the predicted as well as the observed data (Figure 5).

Figure 5.
figure 5

Bee egress rate for the example day from Figure 1 (July 7th, 2014) (blue), along with the model predicted value (red). Several features of the measured data are replicated in the prediction, including low levels until 11:00 h, a drop in ER around 14:00 and the gradual cessation of foraging after 18:00 h. Grey lines show the 1σ confidence bounds of the prediction.

The correlation between the actual (Act) and predicted (Pre) values R(Act, Pre) = 0.90 (Spearman) and R(Act, Pre) = 0.88 (Pearson), indicating a strong, linear correlation between the model predictions and the actual measured value. The geometrical features in the measured data clearly appear in the model’s predictions (Figures 4a and 5). Since there is no temporal component to the model, this shows that the temporal fluctuations in daily bee activity are driven by instantaneous fluctuations in solar energy either as light or heat. The R2 value for the model was 0.78 suggesting that a majority of the variation in bee egress rate is explained by variation in the model’s predictors. The rest of the variation can be encapsulated in a noise term, which draws a value at random from the empirical cumulative probability distribution of the prediction errors.

Importantly, this analysis does not rule out the contribution of other meteorological variables as determinants of bee foraging activity, but the power of this simplified model to explain the observed data suggests that solar radiation and temperature play a dominant role.

2.4 Ranking the contributions of each predictor variable to the model fit

The contribution of each predictor variable is measured as the average increase in model R2 when the variable is used as a predictor, compared with when it is left out. The strongest contribution to model fit is temperature followed by solar radiation, humidity and pressure (Table I). The addition of pressure does improve the model fit but only slightly: R2 = 0.81; σ = 9, 35, 80 with pressure included compared with R2 = 0.78; σ = 10, 39, 76 with pressure excluded. The contributions from wind speed and rainfall are negligible.

Table I The improvement in model coefficient of determination (R2) and Akaike information criterion (AIC) when each variable is added as a predictor. The top row shows the improvement each individual predictor has when it is added to a constant model, the bottom row shows the mean improvement when added to models containing every other combination of predictors

Colinearity between temperature, solar radiation and humidity introduces some uncertainty in the relative contributions we have assigned to them. This error is within generally acceptable bounds (VIF < 3 for each), and it does not affect the relative contributions assigned to rainfall, pressure and wind speed, since these do not co-vary with the other variables. We can therefore conclude that while the relative contributions of temperature, solar radiation and humidity are subject to some small error, the combined influence of all three is the primary explanatory factor for the observed bee foraging rate.

2.5 Model generalisation to longer time scales

Taking a less fine-grained approach, the daily average of both predicted and measured ER can be determined (Figure 6). This approach can establish whether the model’s predictions hold on across a wider time frame, visualising the predicted versus the actual daily averages of bee activity (Figure 6a). The relation y = x is also shown (points lying on this line are perfectly predicted by the model). Estimation errors are distributed roughly around 0 (mean prediction error = 0.3825 ± 0.07, N = 116 data points), with a 90% chance of falling within 10 counts per minutes of the observed value (Figure 6b, c). If this analysis is repeated with weekly instead of daily averages, the agreement between observed and predicted values remains (N = 18 data points, data not shown). This shows that the model can be applied at different temporal resolutions and provide useful predictions of about bee activity.

Figure 6.
figure 6

Comparing model predictions to observed data on a per-day basis. b Mean daily egress rate predicted by the general linear model against the measured value. Relation y = x is shown for comparison to perfect fit. Grey error bars show standard error in the mean in both the actual and predicted data. b Cumulative probability distribution function (CPDF) of the residuals, showing that 90% of the predictions are within 10 cpm of the measured value. c, d Distribution of the residuals (\( {\mathrm{x}}_{\mathrm{i}}-{\widehat{\mathrm{x}}}_{\mathrm{i}} \)). The model shown includes atmospheric pressure as a predictor.

3 Discussion

3.1 Bee hive biology and sensory ecology

We have shown that the day-to-day and minute-to-minute variations in the number of foraging bees leaving the hive is well explained by the weather conditions external to the hive. The number of bees leaving the hive to forage is most strongly correlated to the local temperature and solar radiation. Continuous monitoring of these variables along with bee egress rate allows us to generate an accurate (within a quantified error margin) prediction for the current egress rate of the hive. This result suggests that information about current local meteorological conditions, in particular, temperature and solar radiation, may be continuously gathered by bees and used to make decisions about whether or not to leave the hive. It is reasonable to suppose also that similar weather-monitoring and decision-making processes are occurring outside of the hive, influencing decisions about when to return to the hive from the foraging trip.

At least one other study reports the use of predictive modelling to investigate the relationship between bees and local climate (Devillers et al. 2004). These authors perform co-inertia analysis to show that both solar radiation and temperature data share common structure with bee egress rate. They did not find such common structure between humidity, rainfall and wind speed, when compared with bee egress rate. The radiation and temperature values are then used in a partial least squares regression like that described here. They produce models separately for each of the hives in their experiment, achieving coefficients of determination of 0.62–0.72 (Devillers et al. 2004). They suggest that barometric pressure would be a valuable quantity to investigate, but they did not deploy pressure sensors in their experiment. Here, we include barometric pressure data as a predictor variable, and we show that we can mix data from two hives and produce predictions for both hives using the same model.

Our results are in general agreement with Devillers et al. (2004), though we have improved agreement between model predictions and observed data (R2 = 0.62–0.72 for Devillers et al., R2 = 0.79–0.81 in our study). This may be because we mix training data from more than one hive (adding a per-hive linear scaling factor to do so). The combination of two independent hives to produce training data for our models captures more of the possible variation in bee activity than one hive alone can do. We also record data from two different foraging seasons so we capture more variation in weather conditions. Finally, we have added pressure sensors to our experiment. Barometric pressure accounts for an additional 2% of the variation in bee egress rate when added as a predictor variable alongside solar radiation and temperature, suggesting that it is not an important influence on bee foraging activity. The improvement in predictive power gained from mixing the data from two hives suggests that it may be further improved with the addition of yet more independent hives. A large-scale study of this kind, over several years, with many instrumented hives is warranted.

For any predictive model to be used as an explanation for an observed animal behaviour, it must not require extensive processing or memory exceeding what may be reasonably attributed to the animal itself. At the point of evaluation, our model is a simple function of instantaneous input values, read directly from sensors. This has the advantage of not having large storage and processing requirements. There are therefore fewer demands on bee neural circuitry to implement a functionally similar model. This benefit also extends to the application of this model in practice: low storage and processing requirements would allow the creating of integrated sensor packages that store only the pre-trained model function, and no data, locally. This would allow for small size and low-cost devices, which could cover more hives for a given budget.

The predictive success of our model, despite having no temporal component, may indicate that bees do not need to hold past meteorological observations in memory to effectively modulate their current foraging behaviour. Instead, it may be that instantaneous local observations are sufficient to generally produce effective foraging behaviour.

The sensory ecology and neurobiology of the interaction between bees and their environment, specifically with respect to climate, may prove to be a promising and useful area of investigation. We propose that further study into this matter is timely, specifically in view of the uncertainties of the Earth’s climate. The overall production levels and health of bee species directly impact human food security through pollination services (Rader et al. 2015). More abundant and detailed knowledge of the response of bee colonies to changing climatic conditions could lead to better informed conservation efforts of domesticated and wild bees in general and improved colony management by commercial bee keepers.

3.2 Remote hive health monitoring: a possible warning system for bee keepers

Commercial bee keepers typically manage several thousand hives. In 2010, in the Pacific North-West of the USA for example, the average commercial beekeeping operation maintained 3284 colonies and grossed US$488,660 in colony rental fees (Burgett 2011). The US department of agriculture estimates that the cost to commercial bee keepers of replacing lost colonies was US$2 billion between 2006 and 2012 (Epstein et al. 2012). If the labour cost of hive maintenance can be reduced via automation, or if hive abandonment can be detected early, before the complete loss of the colony, this may have a significant economic impact on the industry. To become applicable, our model must be generalisable to an arbitrary number of hives. To do this, a large training data set using many hives in a wide range of weather conditions would be required. If the predictive accuracy demonstrated above with two hives holds for more hives, then the predictions may be of general practical value.

With the availability of commercial remote hive monitoring solutions such as Arnia® and HiveMind®, commercial and scientific apiaries can be readily equipped with all the sensors required to generate predictive models such as those described here. Input from the meteorological sensors could generate predictions for the activity level in real time. A “calibration mode” would allow the model to train itself using incoming data, allowing for the automatic estimation of the parameters such as the scaling factor for each hive. Hive activity monitoring can perform the same analytical processes described above, except performed dynamically using live, incoming data.

Comparison of model predictions with observed bee count rates may indicate aberrant behaviour such as hive abandonment, parasite infestation or swarming. The monitoring program could easily implement simple logical checks based on the difference between observed and predicted bee activity. Discrepancies could be tracked over time and simple thresholds applied to raise warnings to the user. The severity of these warnings could be determined by the strength of the discrepancy and the length of time over which it has occurred. Large discrepancies on the time scale of days, perhaps weeks, could be detected by averaging prediction error over these time-scales. On the other hand, brief discrepancies at the scale of minutes could be recorded and quantified but selectively ignored if only transient. Warning thresholds could also be calculated automatically during calibration mode by performing the present model analysis (Figure 5) at multiple time-scales. First, the CPDF of the residuals is calculated for a training set, then residuals that are larger than a specified percentile would generate warnings for the beekeeper. Confidence in this system could save labour by reducing the need for systematic and periodic checking of all hives in a large commercial apiary. Rather, focused hive inspections could be in response to specific alerts indicating a high probability of deviation from expected hive activity.

In our study, we did not encounter or generate hive pathology or swarming, so we are unable to offer empirical evidence that pathological deviations can be detected with our system. Combining experiments like those described here, with detailed, manual colony health assessment could illuminate this further.

The addition of other sensors may reveal predictor variables that were not considered here, in particular, variables associated with the interior of the hive. Brood temperature and sound measurements have been successfully used in the context of continuous hive monitoring experiments (Dietlein 1985; Ferrari et al. 2008). It was shown that prior to swarming, there was a significant increase in sound amplitude within the hive at certain characteristic frequencies, and that hive temperature was raised by 3 °C in the period immediately prior to swarming (Ferrari et al. 2008). Hive sound and temperature sensors, gas sensors and other continuous monitoring equipment could be deployed, along with bee activity monitors. The contribution of these variables to model predictive power could then be investigated.

4 Conclusion

The use of in-hive bee activity and meteorological sensors can be used to create predictive models that accurately capture the behavioural response of a bee colony to external stimuli. Here, we have shown that bee foraging effort, measured as the number of bees that leave the hive on foraging flights in each minute, is tightly coupled to the coincident meteorological conditions near the hive. The most important of these conditions are temperature and solar radiation. The reaction period between changes in weather conditions and changes in bee foraging activity is less than 1 min. These results suggest that bees continuously monitor their external environment and use the information gained to make decisions about whether to embark on foraging journeys. The sensory biological mechanisms mediating this phenomenon at the individual level of a bee, and/or level of a hive, are not well understood and warrant further investigation.

We suggest that the predictive modelling that was applied here could be applied to monitor honey bee hives in real time. In a large-scale experiment with many hives equipped with networked egress sensors, those hives showing large departures from model predictions could be manually investigated and directly compared with those that are behaving as expected. This could shed light on other behaviour-influencing factors in the bees’ environment, both internal and external to the hive. Further, such activity monitoring approach is likely to help identify meteorological conditions that endanger bee colonies, putting them under metabolic stress or increasing their susceptibility to disease. It would also help to determine if anomalously large errors in prediction correlate to hive pathologies or other aberrant conditions which may be useful for generating health warnings in commercial hive monitoring equipment. The proposed activity warnings could enhance the effectiveness and focus of beehive health monitoring by reducing the need for regular manual inspection of apiaries. Detailed investigation of coincident meteorological conditions, air quality, pesticide presence and other potential stressors could reveal currently unknown interactions between bees and their environment.