Background

The field of wildlife tracking and bio-logging using electronic devices has experienced explosive growth in the last two decades thanks to important technological advances and decreasing costs, which have made it a widely accessible approach for the study of movement, both in terrestrial and aquatic animals [1, 2]. Despite the dawn of this “golden age of animal tracking,” large whales remain among the most difficult species to study using these technologies. Whales are large, highly streamlined animals that cannot be captured for tag attachment. In addition, deploying and recovering electronic devices at sea on animals that have the capability to move great distances in short periods of time involves complicated and expensive logistics [3]. Furthermore, as with other air-breathing aquatic animals that spend most of their time underwater, data transmission to earth-orbiting satellite platforms is limited to the brief periods that a whale is at the surface and a receiving satellite is simultaneously passing overhead. While these challenges have been partially overcome at fine scales by high-resolution (but short-duration) archival tags (e.g., [4,5,6]), the constraints associated with the transmission of large amounts of data over extended times together with the limited functionality of longer duration tags has led to a dearth of behavioral data at large spatio-temporal scales. This situation has restricted the scope of questions that have been addressed regarding whale distribution, movement, behavior, and ecology.

The primary technology for tracking the long-distance movements of large whales, in use since 1997, has been a “consolidated” tag design [7] linked to the Argos satellite system. The electronics and retention elements of consolidated tags are typically incorporated into a single tag housing that is implanted on the whale’s dorsal surface, from which only the antenna and a salt-water switch are external to minimize hydrodynamic drag [3, 8, 9]. These non-recoverable tags can stay attached for long periods of time (typically several months) before they fall off. In addition to locational data, some consolidated tag models have included capabilities for reporting surfacing intervals and summarized dive behavior data in the form of time spent in discrete depth or temperature intervals (i.e., “histogram data”) or as per-dive metrics, such as dive duration and maximum dive depth [10,11,12,13,14]. While such summaries have provided useful information on overall diving behavior and time budgets, these data cannot be used to make inferences about actual prey encounters or prey captures, and so their value in terms of understanding feeding behavior is limited.

Most of what we know about large whale diving behavior comes from short- and medium-duration archival tags, which permit collection of continuous, sensor-rich data streams. For example, from these bio-loggers we have learned that prey capture events in several whale species are often associated with rapid changes in motion that can be identified by their stereotypical signatures in accelerometer data [15,16,17,18]. This information has been used to examine topics, such as the kinematics of feeding behavior [6, 19, 20], feeding strategies in relation to the local prey field [21,22,23,24,25], or behavioral responses to anthropogenic activities [26,27,28,29]. However, the typical deployment period of these devices is limited to < 24 h (if attached with suction cups) or to a few days or weeks (if attached with subdermal anchors), and recovery of the tags is required to download the complete data record [4, 26, 29,30,31,32].

At broader scales, whale foraging behavior has only been inferred indirectly from tracking data, either from the characteristics of horizontal movement (e.g., from concentrated locations [33]) or from summarized data (e.g., per-dive or histogram data [10, 11]). However, the relationship between inferred and direct measurements of whale feeding behavior across spatial and temporal scales remains unverified [34] and may vary with the scale of observation (i.e., the “grain size”; [35, 36]). As movement behavior is driven by resource tracking and food acquisition through a hierarchy of processes that operate at multiple spatio-temporal scales [37], obtaining this information is essential to quantify the variability of resource dynamics and foraging strategies, to test predictions on emergent behavior, and to improve our overall understanding of how whales perceive and respond to their environment [2, 38]. Now more than ever, this information can be critical to guide biodiversity conservation in the face of rapid global change [1, 2, 38].

Here we present a new satellite telemetry device for tracking the movements and dive behavior of large whales over several months without tag recovery. The tag, manufactured by Telonics, Inc. (Mesa, Arizona, USA), collects dive duration from a salt-water switch, inertial motion from a tri-axial accelerometer, and dive depth from a pressure transducer. The tag uses a microprocessor-based approach [39] featuring data processing software for (a) detecting behavioral events from accelerometer data using an adaptive algorithm to account for individual variation in behavior, and (b) summarizing and compressing dive data streams for transmission through the Argos system. While satellite-linked tag platforms featuring accelerometers and associated software have been recently developed for detecting, abstracting, and transmitting behavioral measures of activity in other marine top predators [40,41,42], this is the first time that accelerometers with event detection software were used on a satellite-transmitting tag for large whales.

Methods

Tag development

Development of the tag proceeded incrementally between 2015 and 2017, during which time we tested a variety of tag and software configurations in collaboration with Telonics. The initial model (RDW-640) used the salt-water switch to distinguish between dives and surfacings, and these data were compressed and relayed via Argos. This model also included a tri-axial accelerometer, but the software for processing this data stream was not yet developed, so the sensor was not active. A subsequent model (RDW-665) added a pressure sensor, increased battery capacity, and implemented a behavioral event detection algorithm to analyze the accelerometer data stream in real time. The updated model generated a data summary for every dive, consisting of dive duration, maximum dive depth, and number of behavioral events detected. Software onboard the tag’s microprocessor packaged these summaries into messages for transmission through Argos, completing the development of the device. As the RDW-665 model included all components and configurations that were used by the RDW-640 model, from this point forward we refer to all tag versions as the “RDW tag” unless specifically noted.

Tag components and design

The RDW tag follows the same design of other consolidated tags for large whales in use since 1997 [3], which consists of a main body, an antenna and external sensor endcap at the distal end, a penetrating tip at the proximal end, and an anchoring system (Fig. 1). The main body consists of a stainless-steel cylinder 18.5 cm in length × 1.9 cm in diameter that houses a motherboard, a certified Argos transmitter (401.650 MHz ± 30 kHz operational frequency), a thermistor for internal tag temperature monitoring, a tri-axial accelerometer, and a lithium battery pack (two DL2/3A Duracell® 1550 mAh 3 V cells in parallel). An external flexible whip antenna (15.8-cm long) and a stalked salt-water switch (2.2-cm long), both constructed of single-strand nitinol (1.27 mm in diameter), are connected to the transmitter and mounted on a polycarbonate endcap (2.6 cm in external length) that seals the distal end of the cylinder with two rubber O-rings. The endcap is held in place by four stainless-steel set screws drilled through the stainless-steel cylinder. The port for the pressure sensor of the RDW-665 model is also mounted on the endcap (Fig. 1).

Fig. 1
figure 1

Schematic diagram of a fully assembled Telonics RDW-665 tag. Side view (top drawing) shows, from left to right, the distal endcap with the exposed Argos antenna and salt-water conductivity switch, the main body partially coated with antibiotic, and the penetrating tip and anchoring system with two rows of stainless-steel strips (“petals”) in deployed position. Top view (bottom drawing) shows the placing of the salt-water switch and the pressure transducer on the endcap. Both views show the two stop tabs extending laterally from the endcap to prevent the tag from embedding too deeply into the whale

The endcap has two perpendicular stops (1.5 cm long × 0.9 cm wide × 0.6 cm thick) extending laterally to prevent tags from embedding too deeply on deployment or from migrating inward after deployment. The penetrating tip is attached to the main body by a threaded screw (1.17 cm long × 0.64 cm in diameter) and fixed with a set screw to prevent unthreading after deployment. It consists of a polyoxymethylene (Delrin®) nose cone into which a ferrule shaft with four double-edged blades is pressed and secured with a transverse roll-pin to prevent unintentional removal. The anchoring system consists of two rows of ten outwardly curved metal strips (each strip was 3.2 cm long × 0.6 cm wide) mounted on the main body at the nose cone (proximal) end (Fig. 1). Total tag weight is approximately 300 g.

The tag’s cylinder is partially coated with a long-dispersant polymer matrix (Resomer® or Eudragit®) in which a broad-spectrum antibiotic (gentamicin sulfate) is mixed to allow for a continual release of antibiotic into the tag site for an extended time to reduce the chances of infection (Fig. 1). Like other consolidated tags, the RDW tag was designed to be almost completely implantable (except for the perpendicular stops, antenna, and salt-water switch), and is ultimately shed from the whale due to hydrodynamic drag and/or the natural migration out of the tissue as a foreign body response [3]. The expected functional life of the RDW tags is 188 d when transmitting 24 h per day with a 45-s repetition rate, and 95% of time spent underwater [43]. The ethics of the use of consolidated implantable tags on large whales are briefly discussed in the “Ethics approval and consent to participate” section.

Tag sensors

Argos transmissions are only attempted when the tag is above the water’s surface to save battery power [39]. The status of the salt-water switch (wet/dry) is used to record dive start and end times to calculate dive duration. The onboard pressure transducer allows collection of dive depth data with an accuracy of ± 2 m down to 200 m and then to ± 1% of deeper depths. A tri-axial accelerometer is also included, and records data at 8-bit precision (1/64 G), with an accuracy of ± 0.003 G and a dynamic range of − 2 to 2 G. The sampling rate of tag sensors is user-programmable and for our trials it was set to 1 Hz (1 s) for the salt-water switch, 0.2 Hz (5 s) for the pressure transducer, and 4 Hz (0.25 s) for the accelerometer. All versions of the RDW tag used in this study were tested in a water-pressure chamber to depths of 500 m with no failures.

Tag software

  1. (a)

    Dive summary algorithm:

    Dive behavior is continuously recorded and summarized for “selected dives,” defined as dives meeting user-specified criteria for depth and duration, to generate “dive summaries”. For the tag deployments presented here, selected dives were identified as dives > 2 min in duration and > 10 m in depth. Summary parameters including the start date, time, and duration of each selected dive were recorded, along with the maximum depth of the dive. Other possible dive depth-related metrics can be reported by the tag, such as a summarized profile of individual dives based on a subset of inflection points or the percentage of time spent in user-defined depth bins [44], but we did not record them in this study.

  2. (b)

    Event detection algorithm:

    The RDW tag can optionally be programmed for behavior event detection within selected dives using the accelerometer to detect rapid changes in motion, such as those often associated with lunge-feeding, or as a more general measure of activity based on variability of the accelerometer data, with the results included in the dive summary. Accelerometer sensor data are processed by the tag’s microprocessor using an adaptive event detection algorithm. Threshold parameters for the event detection algorithm are continually updated from the sensor data stream and informed future iterations of the algorithm, allowing it to adapt over time. This adaptability can account for differences in tag placement on the whale’s body, which can affect the magnitude and potential offsets of a sensor’s signal owing to site-specific differences in acceleration and mechanical processes [17, 45, 46]. The event detection algorithm was specifically developed to detect lunge-feeding behavior in rorqual whales (family Balaenopteridae), which produces strong stereotypical signatures in acceleration data [47, 48] that can be used as a measure of feeding effort.

    For selected dives, events are inferred from the change in the acceleration vector (“jerk”), which for high-resolution archival tags is calculated as the norm of the difference in consecutive acceleration readings [17]. However, for this application, the jerk calculation is integrated over one full second (four measurements) by taking the magnitude of the vector difference in the current accelerometer readings from those one second previous. This variation is used to standardize each measurement to 1 s and to reduce the effect of spurious readings. In addition, accelerometer readings from the first 5 s and final 5 s of each selected dive are excluded to eliminate artifacts from fluke stroking associated with the start or end of a dive [19], as well as from ocean surface wave drag [20].

    The development of the event detection algorithm went through two iterations:

    • Version 1: A study by Simon et al. [19] showed that rorqual feeding lunges produce distinct peaks in jerk, so the initial event detection algorithm identified jerk values that exceeded the mean jerk by a threshold of 3.5 standard deviations (sd), with a 30 s blanking time [17] between identified events to account for prey handling. If the threshold was exceeded multiple times during the blanking time, only the first instance was recorded. Software in the tag’s microprocessor allowed mean and sd of jerk values to be continually updated following each selected dive, making them the mean and sd of jerk for all selected dives up to that point. By updating criteria to identify lunge-feeding events, the algorithm was able to adapt over time and converge on threshold values that better accounted for individual differences in accelerometer readings.

    • Version 2: A subsequent study by Allen et al. [16] indicated that rorqual feeding lunges were best characterized by a jerk value above a specified threshold (jerk maximum) followed by a value below a lower threshold (jerk minimum), so we updated the event detection algorithm to identify instances when the jerk value exceeded a threshold of 1.5 sd above the mean, followed by a value less than one half of the mean occurring within 30 s after the jerk peak. Jerk values had to exceed the upper threshold for 2 s to qualify as a lunge-feeding event, to account for transitory crossings possibly generated by error. Lunge-feeding events for each selected dive were then counted after applying a 35 s blanking time, which retained the first event if multiple ones were detected. As with version 1, the threshold mean and sd jerk values were updated following each selected dive. Thresholds and blanking times were chosen based on those from Allen et al. [16], but modified to be more conservative due to the lack of additional information provided by a hydrophone in the high-resolution archival tags used in Allen et al. and the lower sampling rate of RDW tags (4 Hz versus 50–500 Hz). Further description of the lunge detection methodology is presented in Irvine et al. [24].

  3. (c)

    Data transmission via Argos:

    The RDW tag makes use of a highly compressed data transmission protocol to increase throughput of summarized dive data via Argos. Dive summaries are collected into “dive summary messages,” consisting of a variable number of consecutive (typically four to ten) selected dives, depending on the number of reported summary parameters and other data compression factors such as the similarity of data values being reported. The tag maintains a buffer that held up to ten dive summary messages in the tag’s microprocessor random-access memory. When enough dive summaries are recorded to create a new dive summary message, it is added to the buffer. If there are already ten messages in the buffer, the oldest message is discarded to make space for the new message.

    Tag transmissions can contain either one dive summary message (randomly selected from the buffer) or a utility message consisting of the tag’s current internal temperature and voltage for diagnostic purposes. The update to version 2 of the event detection algorithm also added the current jerk mean and sd values into utility messages to monitor trends in those criteria over time.

Validation approach

In principle, field validation of data collected by the RDW tag would involve a quantitative comparison of the dive summaries obtained through Argos with equivalent summaries generated from data recorded onboard the tag after a deployment. However, consolidated tags are not designed for recovery [3, 7], so this was not an option. Instead, in addition to verifying sensor functionality in the laboratory, we implemented a validation by proxy of our event-detection algorithm by running the tag software on a continuous data record obtained by an archival tag under field conditions. In addition, we evaluated the impact of transmission regimes on data recovery via Argos using simulations compared to empirical data obtained from field deployments of the RDW tag.

Sensor functionality

During development, we tested prototype tags in the laboratory to evaluate sensor functionality and ability to report dive summaries through the Argos system. We replicated dives of varying depths, duration, and complexity by closing the salt-water switch and placing the tags in a pressurized chamber to replicate water depth. Enough dives were replicated to fill multiple dive summary messages, which were then transmitted during an Argos satellite pass and used to confirm agreement between the maximum dive depths and durations of replicated dives and the corresponding tag-summarized values. In addition, diagnostic software in the tag allowed direct download of a short-duration segment of the continuous accelerometer record, which we used to confirm the sensor’s ability to record rapid changes in orientation and acceleration resulting from a person manipulating the tag to simulate abrupt motion changes.

Dive summary and event detection algorithms

We implemented a proxy validation of the tag’s dive summary and event detection algorithms. For this purpose, we used a continuous data record from a Wildlife Computers TDR10-F medium-duration archival tag (hereafter “TDR10 archival data”) deployed on a blue whale (Balaenoptera musculus) for 17.8 d, while it was foraging off southern California in summer 2017 [26, 31]. We examined the performance of the RDW tag dive summary and event detection algorithms by running the TDR10 archival data record through the RDW tag’s algorithms and comparing this output to the corresponding dive summaries calculated from the TDR10 archival data using standard analytical workflows for dive data. Analyses were implemented in the R software for statistical computing, v. 4.0.2 [49]. We note that, while the high resolution and precision of archival data are commonly used to investigate cetacean behavior, they are not guaranteed to fully represent the true behavior of a tagged whale. Thus, “known” events in the proxy validation analysis will refer to events known from the TDR10 archival data, rather than the true behavior of the whale.

As the TDR10 archival tag continuously recorded pressure (depth) and tri-axial accelerometer data at 32 Hz, we decimated the sensor data to 4 Hz to match the sampling rate of the RDW tag. In addition, the TDR10 archival accelerometer data were recorded with a native precision of 1 mG, so we subsequently reduced it to 1/64 G to match the precision of the RDW tag observations. We identified “TDR10 dives” as those > 10 m depth using the find_dives() function from the R package tagtools [17, 50]. We then calculated the maximum depth and duration for each dive, as well as their start and end times using custom R scripts. Feeding lunges were identified manually using stereotypical kinematic signatures from the 32-Hz accelerometer data (i.e., the animal’s depth, pitch, roll, and speed, [51]).

We used the RDW dive summary and event detection algorithms to generate dive summaries from the TDR10 archival data (hereafter “RDW dives”) as they would be received through Argos during an in-situ deployment (i.e., dive start date-time, maximum dive depth, dive duration, and number of lunge-feeding events) based on the decimated 4-Hz pressure and 1/64-G accelerometer data. To validate the RDW dive summary algorithm, we matched known dive summaries from the TDR10 dives to corresponding RDW dives using the dive start date-times and used linear regression to quantify the correspondence between maximum dive depth and dive duration values. For the number of lunge-feeding events per dive, we used polychoric correlation to assess the relationship between the number of RDW-detected lunge-feeding events and the number of known feeding lunges identified in the TDR10 archival data summary. This analysis provided an approximation of a Pearson’s correlation coefficient for two ordinal variables and was conducted using the R package polycor v. 0.7-10 [52].

The ordinal nature of the number of lunge-feeding events per dive limited the utility of typical classification analyses, such as confusion matrices. Instead, we grouped known dives from the TDR10 archival data for each ordinal level of feeding lunges per dive, and calculated a false negative rate as the number of false negatives in the RDW-detected lunge-feeding events divided by the number of dives in the TDR10 archival data summary for that level. A similar procedure was conducted to calculate the false positive rate. These values represent the mean number of lunge-feeding events missed per dive and the mean number of incorrectly identified lunges per dive, respectively, for each ordinal level.

As indicated above, the precision of the accelerometer sensor of the RDW tags was limited to 1/64 G, while the TDR10 accelerometer data were natively recorded at 1 mG. This allowed us to repeat the proxy validation process at the higher precision to determine if sensor precision influenced the accuracy of the RDW event detection algorithm, offering the opportunity to improve future versions of the tag.

Synthetic metrics of feeding behavior

Ecological studies often require data to be considered at different grain sizes to investigate emergent patterns at different domains of scale [53,54,55]. The RDW tag was developed to monitor large whale feeding behavior across ecologically relevant spatio-temporal scales (i.e., 10s to 1000s of km and days to months), such that further synthesis of event data into coarser-grained metrics of feeding behavior may offer insights at broader scales [35, 54, 55]. We evaluated the ability of RDW data to describe patterns of feeding at the per-dive level (i.e., feeding versus non-feeding dives) and at the per-feeding-bout level (i.e., sequences of feeding dives) relative to the TDR10 archival data.

The classification of RDW-derived feeding dives was validated by comparison to known feeding dives identified from the TDR10 archival data using a confusion matrix calculated with the function confusionMatrix() in R package caret v. 6.0-86 [56]. For this classification, we report the true positive detection rate as the number of correctly classified feeding dives divided by all known feeding dives, the false positive detection rate as the number of incorrectly classified feeding dives divided by all known non-feeding dives, and the accuracy as the sum of correctly classified feeding and non-feeding dives divided by the total number of dives.

To assess temporal trends in feeding intensity through feeding bouts obtained from the RDW and TDR10 dive summaries, we graphically examined the probability density distribution of the period of time between feeding dives to identify a behavioral change point criterion, where the right tail of the distribution stabilized at a low value. Feeding bouts were identified as sequences of dives, where feeding dives were separated by a period no longer than the criterion. A period longer than the criterion was interpreted as the whale changing its behavior, or possibly leaving a feeding patch. We tested the RDW-derived probability density distribution for consistency with the distribution of values from TDR10 data using Bhattacharyya’s similarity coefficient [57, 58], where values < 0.05 and > 0.95 indicate that the distributions are significantly different, or similar, respectively, and intermediate values indicate the probability of overlap between the two distributions [57]. The number of RDW-derived bouts was then compared to the number of known TDR10-derived bouts. As described above for the dive summary and event detection algorithms, the synthetic metrics described in this section were generated at the 1/64-G precision of the RDW accelerometer sensor as well as at the 1-mG precision of the TDR10 sensor.

Assessment of data recovery via Argos

Recovery of RDW dive summary data via Argos during field deployments is dependent on the coincident occurrence of the whale at the surface, with the tag scheduled to transmit, while a satellite is overhead to receive the transmission. If a dive summary message is replaced by a new message in the transmission buffer without having been received by a satellite, its data are lost. Thus, data recovery (i.e., the number of summarized dives received) will depend on the number of messages received by a satellite over a given time period. Whales making short-duration dives will generate more dive summary messages per day than whales making long-duration dives, filling the transmission buffer more quickly and potentially replacing messages before they are received by a satellite. Furthermore, longer time periods spent at the surface will allow more messages to be transmitted compared to shorter surface periods, increasing the likelihood of a satellite being overhead when a message is transmitted.

To characterize the expected data recovery during an RDW tag deployment we conducted simulation experiments, where time series of dives for a hypothetical whale were generated under a range of behavioral and tag programming regimes, to test their effect on data recovery. The results of these simulations were then compared to empirical values collected from field deployments conducted on blue and humpback (Megaptera noveaeangliae) whales off California during summer of 2017. A dive time series was composed of sequences of dives followed by post-dive intervals (PDI) during which a whale is near the surface for a cycle of respirations before the next dive. We generated two representative time series of dives (n = 4000), matched in time to real satellite pass intervals, by sequentially drawing a dive duration from log-normal distribution, then calculating a corresponding PDI based on predictions from a linear model fit to dive duration and PDI values gleaned from Dolphin [59] and Acevedo-Gutiérrez et al. [60] (PDI = 0.07944 + 0.29333 × duration). A “short-dive” time series simulated a whale making short-duration dives (mean = 3 min, sd = 1.5 min), while a “long-dive” time series consisted of longer duration dives (mean = 9 min, sd = 1.3 min). These values were intended to broadly reflect the dive behavior of humpback [59] and blue whales [60, 61], respectively, although the species designation is ultimately unimportant, as we expect similar results from other species with similar dive behavior. The 4000-dive length of the time series was chosen so the time series would last multiple weeks, allowing for a range of daily satellite pass durations to be incorporated.

For each simulated dive time series, different tag programming regimes were implemented to determine their effect on data recovery. Sequential groups of seven dives from a time series were identified to replicate RDW tag dive summary messages and added to (and later removed from) a simulated transmission buffer holding ten messages. Daily simulated transmissions occurred either during six, 1-h periods (“6-h schedule”) scheduled to coincide with the most likely time a satellite would be overhead, or during alternating hours of the day (“12-h schedule”) to replicate the range of compromises that Argos users often have to make between continuously transmitting and conserving battery power by reducing the number of transmissions. Satellite pass predictions were obtained with the satellite pass prediction tool available to users of the Argos system via their website [62] for the area off Newport, OR, USA (45° N, 124° W).

During each PDI when simulated transmissions were scheduled to occur, a dive summary message was randomly selected with replacement from the transmit buffer every 30, 60, or 120 s (transmit interval) and assumed to have been transmitted. The different transmit intervals are meant to reflect individual and interspecific differences in a whale’s respiration cycle, while at the surface, which controls when tags can transmit. Data from a transmitted dive summary message were considered “received” if the transmission time coincided with a predicted satellite pass and it was retained after accounting for an empirically determined message corruption rate of 44% (detailed in the next section). We ran simulations for both short- and long-dive time series with all combinations of tag programming conditions and calculated the proportion of all dive summaries received (after excluding duplicates) compared to the true number of dives in the simulated time series. Hourly plots of the number of received dives were also made to examine temporal trends in data recovery that might be related to patterns of satellite coverage and/or transmit schedule.

Field deployments

To provide an empirical comparison to proxy validation results and simulated predictions, we obtained the event-detection threshold values and characterized data recovery and message corruption for 28 RDW tags deployed on blue (n = 14) and humpback whales (n = 14) off southern and central California during July–August 2017 [63, 64]. Seven humpback whale tags were RDW-660 (which only reported dive start time and duration), while the other seven tags were RDW-665 (which recorded dive start time, duration, maximum dive depth, and number of lunge-feeding events). All blue whale tags were RDW-665. To conserve battery power and maximize operational tag life, RDW-660 tags were programmed to transmit when at the surface for five 1-h periods each day, while RDW-665 tags were programmed to transmit for six 1-h periods. Both RDW tag models were programmed with a 10-s transmit repetition rate when at the surface. These transmission periods were selected based on satellite pass predictions for the time and location of tag deployment, available to users of the Argos system via their website [62].

For each tag deployment, we used the received data to extract the event-detection threshold values (mean and sd jerk) from utility messages and summarized them based on their minimum, maximum and last values reported. We also identified and removed corrupted messages of all types using the Cyclic Redundancy Check (CRC) code, and determined the proportion of corrupted messages relative to the total number of received messages. We then calculated the number of unique daily dive summary messages received and plotted the number of received dives by hour for comparison to simulated results. We also calculated the percent of both the simulated and empirical time series data that was summarized by received transmissions as the sum of all received dive durations and PDIs divided by the total duration of the time series. During field deployments, dive end times were calculated as the sum of the dive start time and duration, while PDI was calculated as the difference between a dive start time and the end time of the previous dive. As the PDI value for the last dive in a dive summary message cannot be calculated, the complete dive time series cannot be recovered. For this reason, we removed the last PDI value from each received dive summary message in our simulation study to better match the empirical data when calculating the percent of the tracking period summarized.

Results

Proxy validation of dive summary data

1/64-G precision data

The RDW dive summary algorithm identified 2462 selected dives and 1302 lunge-feeding events (range = 0–11 per dive; Additional file 1: Fig. S1) when implemented on TDR10 archival data at the reduced precision of the accelerometer sensor of the RDW tags used in field deployments (1/64 G). Of these, 753 dives contained at least one feeding lunge, and were consequently classified as feeding dives. A total of 6317 feeding lunges (range = 0–12 per dive) were manually identified in 1345 feeding dives when generating known dive summaries from the TDR10 archival data. There was a near-perfect correlation between the two sets of dive summaries for both maximum dive depth and duration (Spearman’s rank correlation, ρ = 1; Additional file 1: Figs. S2 and S3).

The threshold values (mean and sd of jerk) of the RDW event detection algorithm stabilized quickly and had no variation after about 80 dives over the initial 13 h of the 17.8-d tracking period (mean jerk = 2/64 G/s, sd jerk = 4/64 G/s; Additional file 1: Fig. S4). The number of detected RDW lunge-feeding events per dive was positively correlated with the number of feeding lunges detected in the TDR10 archival data (polychoric correlation r = 0.63; Fig. 2). The false negative rate by lunges per dive ranged from 0.56 to 5.6 (excluding a single 12-lunge dive that the event-detection algorithm missed by 1 lunge), while the false positive rate ranged from 0 to 0.02 (Additional file 1: Table S1).

Fig. 2
figure 2

Boxplots of the results of the proxy validation of the RDW tag’s event detection algorithm showing the number of detected lunge-feeding events per dive as a function of the known number of feeding lunges per dive determined from a 17.8-d TDR10 archival record of a blue whale. Top panel shows results for 1/64-G data, while the bottom panel shows results for 1-mG data. Width of boxes is proportional to the sample size for that category and the numbers at the top of the plot represent the number of dives in that category. Data points are jittered on both axes for better visibility

The accuracy of the RDW event detection algorithm when classifying feeding/non-feeding dives at 1/64-G precision was 0.74 when compared to known feeding dives from the TDR10 archival data (Table 1). The true-positive detection rate was 0.55, indicating many feeding dives were not correctly identified by the RDW event detection algorithm. However, the false-positive detection rate was 0.018, indicating that, when a feeding dive was identified, it was almost always correctly classified.

Table 1 Confusion matrix showing the classification of feeding and non-feeding dives by version 2 of the RDW-665 event detection algorithm compared to dives summarized from continuous Wildlife Computers TDR10-F archive data

The probability density distribution of time between feeding dives showed a high degree of overlap between RDW and TDR10 data (Bhattacharyya’s similarity coefficient = 0.82). Most times between feeding dives occurred at ≤ 60 min (Additional file 1: Fig. S5), indicating 60 min as a good criterion to identify the end of a feeding bout. Using this criterion, the RDW data generated 70 feeding bouts, 21 of which were single-dive bouts. Since our goal for identifying feeding bouts was to illustrate how coarse-grained metrics of feeding activity could be generated from RDW tag data, single-dive bouts were removed, resulting in a total of 49 feeding bouts, compared to 20 known feeding bouts recorded by TDR10 data (none of which were single-dive bouts).

1-mG precision data

Implementing the RDW event detection algorithm at the native (1-mG) resolution of the TDR10 archive data resulted in the detection of 4452 lunge-feeding events (range = 0–10 per dive; Additional file 1: Fig. S1) and 1372 feeding dives, compared to 6317 known lunge-feeding events and 1345 feeding dives in the TDR10 archive. The threshold values (mean and sd of jerk) of the RDW event detection algorithm converged on initial ranges of ± 5 mG/s for both parameters after about 110 dives over the initial 19 h of the 17.8-d TDR10 tag deployment, and eventually stabilized to consistent values after approximately 1 week (mean jerk = 49.9 mG/s, sd jerk = 67.5 mG/s; Additional file 1: Fig. S4). The number of RDW lunge-feeding events per dive detected from 1-mG data was closely positively correlated with the number of known feeding lunges from the TDR10 archive data (polychoric correlation r = 0.88; Fig. 2). The mean false negative rate by lunges per dive ranged from 0.29 to 1.83 (excluding two single-dive ordinal levels with one and two false negatives), while the false positive rate ranged from 0 to 0.22 (Additional file 1: Table S1).

The accuracy of the RDW event detection algorithm when classifying a feeding dive at the native (1-mG) resolution of the TDR10 archive data was 0.95 when compared to known feeding dives identified from the TDR10 archive data (Table 1). The true-positive detection rate was 0.96, while the false-positive detection rate was 0.067, indicating that the vast majority of feeding dives were correctly classified by 1-mG RDW dives.

The probability density distribution of time between feeding dives showed a high degree of overlap between RDW and TDR10 dives (Bhattacharyya’s similarity coefficient = 0.83). Most times between feeding dives occurred at ≤ 60 min (Additional file 1: Fig. S5), indicating a good criterion to identify the end of a feeding bout. Using this threshold, the RDW data generated 31 feeding bouts, 11 of which were single-dive bouts. When single-dive bouts were removed, RDW data reported 20 feeding bouts, which was the same as recorded by TDR10 data.

Data recovery via Argos

As expected, simulated data recovery increased with decreasing transmit intervals (i.e., more received dive summary messages), while longer mean dive duration and PDI increased the proportion of simulated dive summaries that were recovered (Table 2). The 6-h transmit schedule resulted in the recovery of 80.8–97.5% of long-dive summaries across all transmit intervals compared to 53.2–74.6% for short-dive summaries. This represents a relative increase in data recovery of 20.6% and 40.2%, respectively, for a quadrupling of the dive summary messages transmitted between the shortest and longest transmit intervals. More than 84% of the simulated dive summaries were recovered across both short- and long-dives when the 12-h schedule was used (Table 2), with a relative increase in data recovery of 14.3% for short-dive summaries and only 1.2% for long-dive summaries across the range of transmit intervals. For the short-dive time series, gaps were present in the daily pattern of recovered dives when using the 6-h transmit schedule and were centered on gaps in transmit times, related to satellite coverage (Fig. 3, Additional file 1: Fig. S6). In these instances, most dives were recovered for the period 3–4 h before a scheduled transmission hour. No data gaps were present in the daily pattern of recovered dives for the long-dive time series, although for longer transmit intervals (60 s, 120 s), fewer dives were recovered during hours near the start of long gaps in satellite coverage (Table 2, Fig. 3, Additional file 1: Fig. S6).

Table 2 Summary results for a simulation exercise investigating the effect of different dive behaviors and Argos transmission regimes on data recovery
Fig. 3
figure 3

Simulation results to test the effect of RDW tag dive summary and transmission regimes on data recovery. Dives were simulated for an animal making relatively short-duration dives (e.g., humpback whales; A) and long-duration dives (e.g., blue whales; B). Dive summary messages were assumed to have been transmitted every 60 s during six 1-h periods (purple horizontal bars) scheduled to coincide with the greatest chance of a predicted satellite pass (C). Dive summaries were assumed to have been received if a satellite was predicted to be overhead when the message was transmitted

Field deployments

Event-detection threshold values received from RDW tag utility messages were generally higher than those calculated by the proxy validation using the TDR10 archival data, with mean jerk thresholds for blue whales ranging from 5/64 to 16/64 G/s (sd jerk range: 9/64–19/64 G/s) and from 9/64 to 19/64 G/s (sd jerk range: 6/64–18/64 G/s) for humpback whales (Additional file 1: Table S2). The median percentage of corrupted messages for blue whales was 44% (range: 28–58%) for blue whales and 18% (range: 10–32%) for humpback whales (Additional file 1: Table S3). Fewer dive summaries were recovered from RDW tags deployed on blue and humpback whales than would be expected from the simulation of data recovery rates. A mean of 8.8 (sd = 5.2) unique dive summary messages were received per day from tags attached to blue whales, summarizing a mean of 45.8% (sd = 15.0%) of the tracking duration (Table 2, Fig. 4). Fewer unique daily dive summary messages were received from humpback whales tagged with RDW-660 tags (mean = 5.6, sd = 3.3; transmit schedule: five 1-h periods) compared to those tagged with RDW-665 tags (mean = 11.1, sd = 4.6; transmit schedule: six 1-h periods), which resulted in a smaller percentage of the tracking period being summarized (mean = 33.9% versus 71.1%, sd = 15.0% versus 8.3%, respectively). However, by reporting one less parameter (maximum dive depth), RDW-660 tags were able to transmit over 40% more dives per dive summary message compared to RDW-665 tags (mean = 10.1 versus 7.0 dives per transmission, respectively; Fig. 4C).

Fig. 4
figure 4

Percent of tracking period summarized (A), number of unique dive summary messages received per day (B), and number of received dive summaries per transmission (C) from RDW tags deployed on humpback and blue whales off California during summer 2017

Discussion

Proxy validation of dive summary data

The RDW dive summary and event detection algorithms were able to capture with varying success the observed dive and feeding behavior of a blue whale tracked for 17.8 d with a TDR10 tag, whose archival record was used for validation. Maximum dive depth and duration were reported with high accuracy, while accuracy was low for the 1/64-G precision accelerometer data when reporting the number of lunge-feeding events made during a dive. However, the higher accuracy and very low false-positive detection rate when classifying dives as feeding or non-feeding, as well as the similarity of feeding bout metrics derived from RDW and TDR10 data, indicate that these data are useful for coarser-grained characterizations of feeding behavior. Furthermore, the performance of the event detection algorithm was significantly improved in all aspects by increasing the precision of accelerometer readings from 1/64 G to 1 mG.

The small mean jerk values calculated from the TDR10 archival data (~ 2/64 G/s; Additional file 1: Fig. S4), and used as thresholds to trigger an event detection, meant that the minimum jerk criteria (less than half the mean jerk) could not be resolved well at 1/64-G resolution. The added precision of the 1-mG data better resolved these small values, allowing for a greater number of possible values that met the minimum jerk criteria, resulting in improved detection performance. All new versions of the RDW tag now support 1-mG precision accelerometer data. (Parenthetically, the RDW event detection algorithm at the 1/64-G precision may have performed better during field deployments than our validation results indicated, as empirical mean and sd jerk threshold values were larger than those reported for the TDR10 data, allowing a greater ability to resolve values below the minimum jerk criteria).

Probability density curves for time between feeding dives were generally similar in shape for both 1/64- and 1-mG data, although when the behavioral change point criterion was applied, 1/64-G data produced more bouts, including numerous single-dive bouts. This result was likely due to the high number of false-negative feeding dives identified by the RDW event detection algorithm at 1/64 G, which divided feeding bouts observed in the TDR10 archival data into multiple shorter bouts. An implementation of this feeding bout analysis is not currently feasible for field deployments of RDW tags, as limitations of satellite coverage, whale surfacing behavior, and other factors affecting tag transmission can result in often incomplete time series of dive summaries (see results from the data recovery rate simulation; Fig. 3). However, planned improvements to the Argos satellite constellation, as well as other recent developments to improve reception of satellite transmissions [65, 66] raise the possibility of better data recovery in the future, leading to an improved ability to characterize feeding behavior at varying spatial and temporal scales.

Data recovery via Argos

The simulation exercise suggested that very high data recovery rates should be possible, and, although the empirical results did not perform as well, these results, and the relative differences in data recovery, provided insights for how users can plan deployments and maximize the amount of data collected. The rate of transmissions received by the satellite (reception rate) is the primary constraint on data recovery, so increasing the daily transmit hours will boost data recovery, as demonstrated by the difference in percent of the time series summarized between 6-h and 12-h daily transmission schedules (Table 2). However, the gain in received transmissions from increased transmit hours will come with a trade-off of higher battery consumption rate, and further depend on species-specific variation in behaviors, such as surfacing rates, which might limit opportunities for a tag to transmit, as well as potential differences in message corruption rate as suggested by our empirical estimates [67].

The time period summarized by each transmission is an additional constraint on the data recovery rate. Assuming reception rates are equivalent, a greater proportion of the tracking period will be summarized from animals making long-duration dives compared to shorter dives (as observed with the long-dive time series). Each dive summary message will report a longer portion of the tracking period, and fewer messages need to be received to summarize the entire track. If a target species makes short-duration dives, the time period recorded by a dive summary message can be expanded to boost data recovery by selecting only longer duration dives to summarize, provided it fits with the research goals. Similarly, the time period summarized by a transmission is dependent on the number of dives it reports, with more dives representing a longer summary time period. By reporting one less parameter, the RDW-660 tags reported a mean of ten dives per transmission, compared to seven by RDW-665 tags. In this case, the gain in data recovery was likely offset by the reduction of transmission hours from 6 to 5 but it offers an additional way to increase data recovery depending on the needs of the study.

Argos satellite coverage is not ubiquitous, and the effect of coverage gaps was observed in the temporal pattern of reduced hourly data recovery (Fig. 3). During longer gaps in satellite coverage, dive summary messages can pass through the transmission buffer without a satellite ever having been overhead, especially if new messages are generated quickly, as was the case for the short-dive time series. Shorter satellite-coverage-related data gaps can be expected for animals making longer-duration dives (as shown by the long-dive time series), as each dive summary message will report a longer portion of the tracking period, bridging more of the gap in satellite coverage.

Our empirical assessment of data recovery from field deployments of the RDW tag was lower than the expectation from our simulation exercise. This is not entirely surprising, as satellite tracking data, especially in the marine realm, are oftentimes affected by poorly understood or undetermined extrinsic factors that result in data gaps and that reduce the amount of data reported (see next section). The transmission schedule for these tags was set as a compromise between data recovery and battery longevity, and future deployments will use different protocols to meet the needs of each specific project.

Limitations

Recovery of data from instrumented animals is a significant hurdle for research, especially when conducted on large whales, which can move > 100 km per day and do not return to a central place, where a tag can reliably be recovered [3, 68]. For RDW tags, data recovery is limited by the coincident occurrence of the tagged whale surfacing, while an Argos satellite is overhead to receive a transmission. As shown by our simulations, the duration of recorded dives can also affect data recovery, as longer dives will summarize a greater portion of the tracking period with each transmission. Collectively, this means that dive summary time series from RDW tags are rarely complete, and the number of dives reported from each tag can vary widely depending on individual behavior and transmission schedule. Furthermore, even if recovery of the entire summarized time series is achieved, it must be understood that dives not meeting selected dive criteria are not recorded by the RDW tag. Extrinsic factors such as bad weather or biofouling can also affect data recovery by limiting or corrupting satellite transmissions. The biases associated with these dive summaries are related to the degree of irregularity and the scale of the behavior being studied [69, 70], but a more adequate characterization will require dedicated research. For these reasons, RDW tags should be considered to provide a relative index of dive behavior, rather than a continuous and complete record.

We assumed that the received dives were a random sample of selected dives that occurred during the tracking period. However, the process of grouping consecutive dives for transmission may introduce bias through serial correlation, while the whales’ surfacing patterns and tag duty cycling may further contribute to a lack of independence in ways that remain to be characterized. Thus, more research is needed to assess how dive summaries are received, if animal behavior might affect these trends, and how the relevant scales of behavior being studied might be affected. For this reason, in-situ validation of detected events by means of using concurrently attached bio-loggers on whales carrying RDW tags remains a high priority.

Conclusions

The correspondence between RDW and TDR10 dive summaries and their derived feeding bouts in the proxy validation exercise demonstrated that RDW tags can link local-scale behavior to broader, regional, or ecosystem-scale processes by monitoring per-dive behavior over multi-month timescales of movement. A previous study using medium-duration tags [24] showed that the number of feeding lunges made per dive is related to the duration of feeding bouts in both blue and fin whales, suggesting that longer term behavioral monitoring can more fully describe the drivers of residence time over the course of the feeding season. In addition, these behaviorally mediated processes such as sex-based habitat partitioning or diel changes in depth related to prey distribution can lead to variable or differential exposure to anthropogenic impacts [24, 26, 71], making this information highly useful to management and conservation organizations.

Conceptually, the flexible sensor configuration and adaptive software capabilities of RDW tags makes them generalizable for a variety of applications with cetaceans, which may extend to studies of species that use other foraging tactics, such as raptorial feeding or ram-filtration [72]. Known behavioral cues associated with non-feeding behaviors could also be incorporated into the event detection algorithm, for example to investigate patterns and trends in male singing [73, 74] or agonistic interactions [75]. The tags could also be used to monitor changes in body condition over time based on trends in buoyancy, as implemented through hydrodynamic glide models [76,77,78]. Such information could further inform studies of the effects of anthropogenic disturbance on individuals and how related changes in fitness might scale up to the larger population (e.g., [79, 80]).

Advances in microprocessor technology continue to reduce component size, operating voltage, and current consumption, while at the same time increasing the available on-board memory and processing speed. Future improvements to the software (e.g., refinement of event detection algorithms) and advances in hardware (e.g., addition of other environmental sensors and increasing sensor precision) will further expand RDW tag applications for ecology, management, and conservation. The RDW tag joins a new generation of devices with the technological capacity to collect and, in some cases, process large volumes of data onboard [26, 41, 81]. These advances pave the way for the routine generation of key metrics of dive behavior for marine wildlife onboard non-recoverable smart tags across large spatial and temporal scales, while the ability to dynamically update event detection parameters (e.g., to account for differences in tag placement or behavioral trends) offers opportunities for improved long-term behavioral and physiological monitoring.