1 Introduction

Social insects, including ants, bees, wasps, and termites, are among the most ecologically dominant animals on the planet. Social bees, for example, are among the world’s most important pollinators and play an essential role in global food production, where a third of consumable crops come from pollinator-dependent plants [1]. However, studying the behavioral responses of social insects to environmental perturbations poses several challenges. One central challenge is precisely tracking multiple animals while they perform rich behavioral repertoires. Compounding this challenge, ecologically relevant behaviors often occur in complex visual settings and over extended time scales (up to several days, weeks, or even months [2]). In addition, individual animals within social insect colonies vary significantly in behavior, and this variation has functional impacts on colony performance [3]. Thus, a central challenge in studying the behavior of social insect colonies is high-resolution quantification of individual behavior while maintaining identity over extended time periods. Finally, for social insects, the colony—rather than the individual animal—is the functional unit for critical ecological and evolutionary processes, highlighting the importance of approaches that scale readily to replication across multiple colonies.

Recent advances at the intersection of machine learning/computer vision, automation, and ecology have resulted in significant progress on each of these challenges. The use of fiducial tag-based tracking is now well-established for social insects and has been used to study long-term behavioral dynamics within social insect colonies, albeit with limited behavioral resolution [2, 4, 5]. More recently, the application of deep-learning (e.g., DeepLabCut [6], Trex [7], DeepPoseKit [8], colony-wide markerless tracking [9, 10] and SLEAP [11] has led to rapid progress in the ability to provide detailed behavioral quantification via markerless animal tracking and pose estimation. In parallel, recent work has established automated, robotic approaches for high-throughput, parallel imaging of multiple social insect colonies [12]. When paired with robust methods for maintaining individual identities over time [13, 14] such approaches provide a framework for quantifying collective behavior and responses to ecologically relevant stressors across extended time periods [15].

Here, we describe an approach for long-term monitoring of multiple entire bumble bee (Bombus spp.) colonies that combines the complementary strengths of these approaches. We focus on bumble bees because they are not only among the most important wild pollinators in North America, but also because they are an emerging, tractable model system for social insect behavior. In addition, bumble bee nests represent a structurally complex, naturalistic imaging environment that may be typical of many biological imaging applications, and where maintaining individual identity can be particularly challenging. Our approach combines automated, long-term monitoring and tag-based tracking with pose estimation to quantify behavior across multiple entire bumble bee colonies over a 48 h period (Fig. 1). We assess two benefits of this integrated approach: (1) Improved centroid tracking using pose data to interpolate through missing tag-tracking data (e.g., occlusions), and (2) improved behavioral resolution by integrating pose tracking. To assess the latter, we quantify an important but understudied aspect of behavior in bumble bees (antennal activity) and how it is affected by pesticide exposure.

Fig. 1
figure 1

A Schematic design of the temperature-controlled behavioral tracking arena for bumble bee colonies. B An example of BEEtag applied to a colony within the behavioral rig with associated identities tracked for each readable tag. C Identifying behaviorally relevant features within the colony. Features of the nest were annotated by hand, and outlines are displayed in the displayed image

2 Experimental methods

2.1 Trial structure and video collection platform

For behavioral experiments, we collected data from 60 experimental bumble bee (Bombus spp.) colonies, of four different species: B. impatiens (n = 29), B. bimaculatus (n = 15), B. griseocollis (n = 15), and B. perplexus (n = 1). These colonies were reared in the lab under controlled environmental conditions (26 °C, 60% RH) from wild-caught queens collected in spring (April–May) of 2019 in Massachusetts. Colonies were randomly assigned to one of four experimental treatment groups: Control, neonicotinoid pesticide exposure (10 ppb imidacloprid provided continuously in nectar), cold-stress (~ 9 °C for 2 h), or combined cold-stress and pesticide exposure.

Prior to experimental trials, each colony was anesthetized with CO2 and all adult bees (workers and queens) were removed from the nest. Each bee was tagged with a unique BEEtag barcode [13] using cyanoacrylate glue. After tagging, all bees were returned to the nest, which was placed in a high-resolution imaging platform that separately housed four colonies, where they had ad libitum access to pollen and nectar. The imaging rig was a modified version of one previously used for parallel monitoring of bumble bee colonies [15], in turn a modified version of the MAPLE robot [12]. This imaging system consists of a camera mounted to a Cartesian gantry system, with position controlled using Matlab scripts and a SmoothieBoard stepper driver [15]. High-resolution video sequences for subsequent behavioral analysis were recorded from each colony for ~ 48 h. In all colonies, brief (2 min, ~ 1.5 Hz) videos were recorded from each colony using a Point Grey monochrome camera (Grasshopper 3, 3000 × 4096 px). Videos were acquired from each colony in succession by moving the camera between colonies using the robotic gantry, yielding videos every ~ 9–10 min from each colony over a 48-h period [15].

This system was modified to include temperature monitoring and control separately for each colony (Fig. 1A). The housing around each colony was outfitted with additional thermal insulation on the side and bottom walls. Three digital temperature probes (DS18B20) were placed in each colony, two within the inner nest chamber, and one in the outer environmental chamber (i.e., the space between the nest arena and the temperature chamber walls, Fig. 1A). Temperature readings were taken from all probes from each colony roughly every 20 s via Matlab scripts and an Arduino Uno microcontroller. Temperature was controlled in the outer environmental chamber of each nest via Peltier thermoelectric heat pumps (TEC1-12710) placed on the bottom of each colony. Peltiers were connected to heat sinks and air-circulating fans on both faces (i.e., one within and one outside the chamber), with control signals generated by a custom PID controlled implemented in Matlab. In cold-stressed colonies, temperature challenges were implemented over the course of 4.5 h. Air temperatures in the outer environmental chambers were initially brought to 24 °C for 10 min before ramping down to 9 °C over the course of an hour (0.25 °C/min). Air temperatures were then maintained at 9 °C for 2 h before ramping back up to 24 °C over the course of an hour (also 0.25 °C/min) and finally held at 24 °C for 20 min, after which nest temperatures were not actively controlled. Cold-stressed colonies were exposed to this cold-stress period once a day (twice total over the 48 h monitoring period). Circulation fans were run continuously in all colonies over the entire monitoring period so that any behavioral impacts of air flow or vibration would be constant over the experiment.

2.2 Tag tracking and brood mapping

Positions and identities of each visible tag were tracked within each frame using previously established methods [13, 15]. In summary, images were first preprocessed to identify regions potentially containing bees using median-intensity background subtraction. Segmented images were then processed using the BEEtag Matlab package after tracking parameter optimization. Structural elements of the nest (including developing larvae and brood, food storage pots, and nectar reservoirs) were manually mapped for each colony using custom Matlab-scripts.

2.3 Pose estimation with SLEAP

The Social LEAP Estimates Animal Pose (SLEAP) [11] software package was used for pose estimation. The trained pose estimation skeleton contained 12 anatomical positions of interest on the bee: head, antenna (L, R), upper thorax, centroid, abdomen, and end of the tibia of the forelegs (L, R), midleg (L, R), hindleg (L, R) (Fig. 3A). Our training set consisted of 943 annotated bees from 31 different colonies and 49 frames. Images for training and inference were downsampled in resolution from the original 3000 × 4096 to 1024 × 1024. Within each video trial, the first 10 frames are ignored due to image noise from the robotic gantry, and pose estimation inference was applied to the subsequent 50 frames (~ 33 s) per recorded video.

The U-Net convolutional neural network architecture [16] was used for both the localization of the individual bee (anchor neural network) and estimation of multi-peak confidence maps for labeled body parts (instance neural network) [11]. The centroid of the bee was used as an anchor to identify individuals across frames and create a cropped region of interest around that anchor as the input for creating confidence maps of the locations of the remaining body parts (Fig. 2A) [11]. To train each model, we ran a maximum of 200 epochs with a batch size of 4 with the potential for early stopping if validation loss plateaus. We used the Adam optimizer and initial learning rate of 1 × 10–4. All models were trained using Google Colab Pro, NVIDIA Tesla V100 GPU. We evaluated model training and inference accuracy via mean localization error (i.e., average distance (pixels) between ground truth and predicted node) and object keypoint similarity (OKS, the average of predicted points that are within a precision threshold normalized by object size; a scale of 0–1, with 1 meaning no error [17]). The anchor NN mean localization error was 1.5 pixels and 0.977 OKS. The instance NN mean localization error was 7.4 pixels, and an OKS of 0.334 (Fig S1).

Fig. 2
figure 2

A Examples of predicted pose estimation of a single representative bee and B bumble bee colony with pose estimation overlay

2.4 Integration of pose estimation with tag tracking

Next, we integrated pose estimation and tag tracking by calculating pairwise Euclidean distances from detected tag centroids and centroids of detected pose skeletons within the same frame. The centroid of the pose skeletons were assigned to the nearest BEEtag centroid within a 10 pixel distance cutoff. If a tag was occluded in a frame or could otherwise not be read, the tag centroid for that individual for a frame was classified as missing (Fig. 3A). In some instances a tag was unreadable but a pose skeleton was still predicted on the bee, allowing the pose estimation model to serve as a redundant method to track individuals when their tag was occluded. We used a nearest neighbor algorithm to pair the last known location of the bee with the nearest unpaired pose skeleton in the subsequent data frame with the unknown bee location. These unpaired skeletons were ranked according to proximity to other bees in that frame with missing data before assigning the closest BEEtag to that skeleton. After applying this interpolation method we found our dataset of paired BEEtag with pose skeleton significantly increased from 3.0 × 106 to 4.4 × 106 (~ 50% increase) bee-frames, and the average number of frames a bee is tracked per video increased from 23 to 34 (Fig. 3B). We then manually scored the accuracy of the pairing pose skeleton with BEEtag across 2755 randomly selected instances of missing data and found 66 instances of incorrect pose to BEEtag assignment, a 2.39% error rate.

Fig. 3
figure 3

A Centroid path for a single individual across 50 frames. Locations tracked with BEEtag are displayed in blue. Dashed white line and pink points show data interpolated by integrating pose estimation and BEEta. B For each individual bee, the number of frames that bee is identified across videos before and after integrating BEEtag and SLEAP pose estimation showing a significant increase in tracked bees per video; Wilcoxon signed rank test (t statistic: 8.528, p = 1.01 × 10–10) (color figure online)

2.5 Longitudinal tracking of individual identities and associated pose estimations show colony responses to cold

We quantified shifts in behavior in response to cold, using both tag-alone and tag- and pose-based centroid tracking methods. We quantified two behaviors based on intersection with key nest elements, following previous work [14]: (1) nursing, when bees are in proximity to the brood, (2) foraging, when bees were in proximity to nectar or pollen sources. For each, we calculated the portion of individuals performing each behavior as a function of time (Fig. 4, Fig S2). The fraction of workers foraging decreased at lower temperatures, while the fraction nursing increased, likely reflecting an increase in incubation in response to cold [18]. We also found in some cases improving centroid tracking (by integrating pose estimation) yielded stronger correlations and statistical significance. For example, the positive relationship between temperature and foraging behavior increased in magnitude and significance after integrating pose tracking (Fig S3 A, B. BEEtag only: Pearson-r = 0.37, p = 0.025, BEEtag + pose: Pearson-r = 0.56, p = 0.006).

Fig. 4
figure 4

A Timeline of thermal stress experiment for a single colony. Vertical lines show when the camera is triggered to record for 2 min, shown for a single colony. B Nest air temperature over 48 h within the selected colony across the experiment. C–E Colony behavior over time, with trend lines showing the mean fraction of detected bees on the brood (C), foraging on pollen or nectar (D). Darker lines represent the moving average across 5 timepoints

2.6 Application: quantifying antennal activity

We applied tag tracking with pose estimation to an important but understudied behavior in bumble bees: antennal activity. While tag tracking alone can quantify overall locomotor activity of an individual bee (i.e., centroid movement) and many key behaviors (such as nursing, broadly defined, as demonstrated above), not all behaviors are quantifiable via centroid movements. For example, in bumble bees, antennal activity is independent of whole-body locomotion, and is tied to stimulation level that differentiates distinct behavioral states (including sleep [19]) [20]. We quantified antennal movement rates while bees were immobile (i.e., centroid not moving) for each bee within each trial. Antennal activity was quantified as the proportion of time when antennae were moving between consecutive frames. Movement was quantified by thresholding frame-to-frame movement speeds of both the centroid and both antennae, with the threshold for movement (vs. digital noise in tracking) determined from the bimodal distribution of frame-wise speeds for these body parts (Fig. S4).

We then used this approach to examine whether individual bees show repeatable individual variation in their rate of antennal activity when immobile at normal temperatures (22–28 °C). We found strong evidence for repeatable variation in antennal movement rates among individual bees (Fig. 4, Kruskal–Wallis test, Χ2 = 1106, df = 672, p value < 2.2 × 10–16). While previous work has demonstrated stable individual variation related to overall locomotor activity in bumble bees [14], this result demonstrates an additional axis of individual variation that could be related to division of labor and stimulation [19].

Next, we quantified the impacts of exposure to a common neuroactive insecticide (imidacloprid, a neonicotinoid pesticide) on antennal activity during cold exposure (10–16 °C). We focused on cold exposure because temperature can modulate the impacts of neonicotinoids on locomotor activity [21]. We found strong evidence that imidacloprid exposure increases rates of antennal activity when bees are immobile (Fig. 5). This could reflect altered olfactory processing, potentially consistent with recent work showing imidacloprid specifically impairs olfactory (and not visual) learning in bumble bees [22].

Fig. 5
figure 5

Individual variation in rates of antennal activity during immobility for 673 individual bees, under room temperature (22–28 °C) conditions. Vertical blue lines show the 25th and 75th percentile for individual bees across trials, ranked by median movement rate (color figure online)

3 Discussion

Our approach allowed us to perform semi-continuous video recording and behavioral quantification of uniquely identified individual workers within bumble bee (Bombus spp.) colonies over nearly 2 days, using an automated imaging platform [12, 13]. Our results show that a relatively simple integration of two established approaches (fiducial tag tracking and pose estimation) has significant advantages, including improving overall centroid tracking performance (~ 50% increase over tracking with tags only) and analysis of subtle aspects of behavior (e.g., antennal movements) within individual animals over extended time periods, multiple trials, and in response to environmental perturbations (Fig. 6).

Fig. 6
figure 6

Difference in rate of antennal activity while bees are immobile during cold-stress periods (10–16 °C) between colonies exposed to 10 ppb imidacloprid (right) and control sucrose (left) colonies. Linear mixed effects model, Imidacloprid vs. control as fixed effects, colony as random effect; N = 205, 21 groups (colonies). df = 17.4, t = 2.70, p = 0.015

Using SLEAP, we trained a top–down pose estimation framework, yielding an accurate centroid detection via anchor neural network (0.96 OKS), and an instance neural network (0.33 OKS) for localizing 12 different anatomical positions on a bee. Improvement in prediction may come from refined model tuning and creating a larger training representing more of the complexity observed across the entire dataset. However, our results demonstrate the potential for identifying biological meaningful responses to environmental perturbations (e.g., antennal activity), even with relatively limited accuracy in pose estimation. Such pipelines for multi-animal pose tracking across a large number of colonies lay the groundwork for integrating with ongoing unsupervised behavioral classification works [23, 24] to create more complete representations of behavior at scale and in complex environments.

While we focus here on bumble bees, this general approach is likely to apply to a variety of social insects (and other animal groups). For example, tag-based tracking is already established for many species of social insects, including honey bees (Apis mellifera) and several species of ants [4, 25]. Collective behavioral thermoregulation is widespread in social insects (including honey bees, Apis mellifera [26]), and our approach could improve understanding of these behaviors. The combination of BEEtagand SLEAP pose estimation represents a high-throughput pipeline for the discovery and quantification of ethologically relevant stress responses in social insects.