Background

Studies on animal behaviour often involve the quantification of individuals’ activities [1], from the definition of an ethogram to the quantification of an activity budget [2]. Knowledge on how individuals allocate their time according to different activities is important in terms of understanding their flexibility towards changes in the environment, such as variations in temperature [3, 4], habitat [5, 6], social systems [7] or prey availability [8, 9].

Traditionally, the assessment of activity budgets has required long hours of observations in the field [10], and have been applied to various species (e.g., in primates [11,12,13], birds [14, 15], deer [16], rodents [17], fish [18], bats [19], insects [20], seals [21], cetaceans [22]). However, this is not always practical. For example, when animals are active at night, when they spend time in hidden enclosed places (e.g., burrow), or when they travel long distances in remote areas (e.g., dense forest, ocean) direct observations are hindered. In addition, the presence of a human observer can potentially disturb the animals and impact the integrity of collected information on their behaviours [23]. Recent technological developments have given rise to devices that can be deployed on animals (i.e., animal-borne devices) and that can thus remotely record variables that are related to different activities undertaken by study animals. This has greatly enhanced our understanding of time allocation in elusive wild populations [24,25,26,27].

Several types of instruments have been used to study animal activity budgets. First, changes in the geographic location of an animal may inform on its activity. From the recordings geographical positions using tracking devices such as radio-tracking, or global positioning systems (GPS), the speed [28, 29], the sinuosity [30, 31] or a combination of the two [32] can be derived to infer behavioural activities. Second, if species are moving through different environments to engage in various activities, distinctive features of the environments can be recorded and related to activities. For example, in seabirds, sensors recording the accumulated time spent immersed in water inform on the time spent floating on the water or diving [33,34,35,36]. Similarly, temperature loggers have been used to estimate the time spent in different environments, such as in the water, in the air and on the land [37, 38]. For diving species, detailed information on animals’ diving behaviours can be obtained from the use of time depth recorders [39,40,41]. Ultimately, combining data from different sources, e.g., recordings of the depth, the temperature and the light, have been shown to allow for robust interpretations of activities undertaken by elusive animals [42].Third, since the animal’s behaviour is the direct consequence of its coordinated body movement [43], the body motion and posture of an animal can be monitored and allow researchers to make inferences about an animal’s behaviour. Acceleration sensors have hence often been used to study animal behaviour [44]. The further design of bi- [45] and then tri-axial accelerometry [46] allowed for more detailed study of animal movements in three dimensions and increased the number of different activities that could be recorded and automatically identified [46]. In addition to time-activity budgets, such information is increasingly used to assess energy expenditure during each activity [47]. Providing that sufficient knowledge on the species and their movement during different activities are available to correctly interpret the motion in every axis, accelerometers are extremely powerful tools to record animals’ activities remotely. As such, they have been widely used on a great diversity of species (reviewed in [48]). However, accelerometry data is limited in terms of surrounding environmental information it can yield, with such information potentially underpinning meaningful interpretation of these behaviours.

Animal-borne acoustic devices can record and monitor the vocalization of animals in various contexts. In addition to these vocalizations, sound recordings can also provide information on the activities of animals, since different activities generate different sounds and background noise. Hence, information on speed of movement (particle flow), different environments (open air, shelter, water), environmental interactions (browsing, gnawing, digging, scratching, diving, etc.) can be captured. With the recent advancement of acoustic recording technologies, this concept has been explored and applied to visually identify the flipper strokes of seals [49] and the foraging behaviour of deer [50] and bats [51] from spectrograms. Furthermore, the automatic detection of the behaviours and activities of birds from sound data have previously been demonstrated [52]. Acoustic recorders have also been used to improve automatic classification of behaviours from accelerometers [53, 54].

Here, we aimed to first solve the challenge of recording sound data through instrument deployment on wild free-ranging seabirds, i.e., species that move both in the air and in the water, where most dive to feed on marine resources. Second, we developed a procedure based on existing statistical learning methods to automatically identify the activities of equipped individuals, exclusively from animal-borne acoustic data, to assess their time-activity budgets. Our study species is the Cape gannet Morus capensis, an endangered seabird endemic to southern Africa [55]. This species has been recently classified as endangered by the IUCN red list because of a drastic loss of more than 50% of the population over three generations [56]. This has mostly been related to a massive decrease of their natural feeding resources due to fisheries [57,58,59]. Cape gannets feed mainly on small pelagic fish, sardines Sardinops sagax and anchovies Engraulis encrasicolus [60]. Their foraging effort, in terms of trip duration and time spent in different activities, reflects the abundance of their natural prey in the local marine environment [61,62,63,64]. Furthermore, their foraging effort directly influences their breeding investment and success [65, 66]. As a consequence, the monitoring of their foraging activities at sea is of particular interest in relation to both the local marine ecosystem and the management of this threatened species. We deployed acoustic recorders on chick-rearing Cape gannets to record their behaviour at sea (data from 10 adults used in this study). Based on previous work with observations from bird-borne video cameras [67] we identified three different main activities: floating on the water, flying, and diving. These activities are associated with different sounds that can be identified by a trained human ear so that they were manually labelled on a subset of the data set (data from four individuals randomly selected representing ~ 33 h of acoustic data). Thirty five acoustic features were then extracted to acoustically describe the activities. A supervised learning algorithm was trained on the labelled data to automatically identify activities on non-labelled data (total of ~ 93 h of acoustic data). To do this, five types of supervised learning algorithms were tested using the Classification Learner App (Statistics and Machine Learning Toolbox, Matlab R2019b) and the k-nearest neighbour model was finally chosen for its performance on the diving-class activity (rare class of high interest). The resulting time-activity budget of foraging Cape gannets, as quantified from acoustic data exclusively, is presented and compared with results obtained from previous studies on the same species but using different devices. Furthermore, we conducted a systematic review on studies that automatically classified activities from animal-borne devices and compared the performances obtained from the analysis of various types of devices.

Results

Different sounds for different activities

Each activity undertaken by the Cape gannets when foraging at sea was associated with different sounds recorded by the bird-borne acoustic devices (Fig. 1A).

Fig. 1
figure 1

Illustration of (A) the sound spectrogram along with (B) the manual identification and labelling of activities and (C, D) the predictions before and after revision. Three main activities were defined and included in the budget (flying, diving and floating on the water) and two additional transition activities (entering water and taking off) were used exclusively for the revision algorithm. These transition activities were used to confirm dive and flying events, and then merged into their corresponding main activity. Isolated segments were removed and relabelled and predictions were smoothed using a moving median over 6 segments

Different values of acoustic features were measured for each activity (Fig. 2), as calculated on sound segments of length ~ 1.4 s (corresponding to 214 samples). For example, the sound spectrogram (Fig. 1A) shows that the sound is louder and spans a wider frequency range during flying compared to diving or when floating on water, and this was measured by their mean RMS and spectral bandwidth values (see red crosses in Fig. 2A, B). For all the features though, the distributions for each activity overlap in some way (Fig. 2).

Fig. 2
figure 2

Density estimation of a selection of acoustic features for each activity (8 out of 21 temporal features, 8 out of 14 spectral features). Means and medians are represented by blue and black lines, respectively. The red crosses indicate the values for each feature calculated on the data sample illustrated on Fig. 1 (calculated as means on all segments per class)

Automatic identification of activities from sound data

Among the five types of supervised learning algorithms that were tested (see “Materials and methods”), the k-nearest neighbour model was finally chosen, because its ratio between true and false positives for the diving class (of highest interest in our study case) was higher than that in other algorithms, still with a similar global accuracy.

The classification procedure was able to correctly classify the activities of Cape gannets (the “labelled set”) with a global accuracy of 98.46%. The performances, as measured by the global confusion matrix and the ROC’s area under the curve (AUC) for each class, varied per activity (Additional file 1: Figure S1). The sensitivity was lowest for the class ‘diving” compared to the other classes (Fig. 3), meaning that over all “diving” segments, 62.3% (908/1457) were correctly detected (others were wrongly classified as floating or flying), whereas for “flying” and “floating” segments, > 98% of segments were correctly detected (Fig. 3). Nonetheless, when diving was predicted, it was reliable given the high precision value (95.5%, Fig. 3). The classes “floating on water” and “flying” were predicted with high accuracy, given the high values of both indicators in all instances (> 97%, Fig. 3). Overall, the number of false negatives and false positives was low, as measured by the high value of “Informedness” at 97.66% (the multi-class equivalent of the Youden’s index). These results were constant among the four individuals studied, with the classification performances being similar between individuals (Additional file 2: Figure S2).

Fig. 3
figure 3

Performances of the algorithm (after classification and revision) on the labelled data set (data points correspond to time segments ~ 1.4 s) summed over all individual bird files (4 individuals). The confusion matrices (squared 3 * 3 matrices) shows the number of correctly classified events (True Positives, TP) for each class on the diagonal, the number of False Positive (FP) per column for each class (except the value on diagonal) and the number of False Negative (FN) per line for each class (except the value on diagonal). Performance indices of Precision (TP/(TP + FP)) and Sensitivity (TP/(TP + FN)) are shown for each class on the bottom row and right column, respectively

When studied in terms of activity budget (meaning that 1.4 s segments are grouped into “events” of the same activity), it appeared that the number of predicted events were over-estimated, although they were predicted with shorter duration (Fig. 3B). Nonetheless, when studied in terms of time-activity budget, the predicted time spent in each activity was very close to the observed time (between 0.3 and 1.1% of difference depending on the activity, Table 1).

Table 1 Results of the classification algorithm on the labelled data set, when aggregated into behavioural events

Acoustic-based time-activity budget of a seabird

Applying the algorithm to non-labelled data, we found that when foraging in December 2015 from Bird Island (Algoa Bay, South Africa), chick-rearing Cape gannets spent on average 35.1%, 63.7%, and 1.2% of their time flying, on the water and diving, respectively. Eight of the nine individuals spent most of their time floating on the water, although this varied largely per individual (range 43.3–80.1% of time, Fig. 5, Additional file 5: Table S1). The number of dives estimated per individual also varied greatly between individuals, from 23 to 174 dives per trip (Fig. 4).

Fig. 4
figure 4

Time-activity budgets (% of time spent flying in orange, floating on the water in blue and diving in red) of nine Cape gannets as predicted from acoustic data exclusively. The black dots show the number of predicted dives for each individual (y-axis indicated on the right)

Systematic review on automatic classification of activities from animal-borne devices

We extracted information from 61 reviewed classifications (54 articles, including our study), published between 2000 and the 5th of April 2021, that automatically classified activities using supervised learning algorithms and based on data from animal-borne devices (Table 2).

Table 2 Information extracted from 61 reviewed classifications (54 articles, including our study)

Terrestrial species were by far the most studied species (n = 40, Table 2, Fig. 5), followed with aquatic species (n = 13) and flying species (n = 8). The most commonly used devices were accelerometers (82% of reviewed studies, Table 2), either alone (n = 34 studies) or in association with other devices (n = 16). Acoustic recorders have rarely been used in this context as we found only three studies that met our criteria for the systematic review. The weight of devices was reported in only 48% of the studies and ranged widely for all devices categories (Table 2). The different types of devices varied in terms of sampling frequency, with the GPS devices being the most limited (up to 1 Hz at the highest), while acoustic recorders provided the highest sampling frequency (> 10 kHz). In comparison, accelerometers were used over a large range of sampling rates, from 0.02 to 100 Hz (Table 2). Although the sampling frequency did not seem to be directly related to the global accuracy, a higher sampling frequency seemed to allow for a higher number of activities studied in the activity budget (Additional file 3: Figure S3).

Fig. 5
figure 5

Performance of automatic classifications of activity budgets as measured by the global accuracy, as a function of the type of devices used in the 61 reviewed classifications (from 54 articles, including our study). Colours indicate a categorisation of species: n = 40 terrestrial species (green), n = 13 aquatic species (blue), n = 8 flying species (orange). GPS global position systems. Accel Accelerometers. Other devices deployed concomitantly to accelerometers included GPS, gyroscope, magnetometer, pressure sensors, and acoustic recorders [26, 32, 52,53,54, 68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115]

The number of activities studied in a budget varied greatly among studies, from two to 19 (Table 2), with a mode at three activities (Fig. 6). The highest number of activities (19, Table 2) was extracted from acoustic recorders, followed with a study based on accelerometers (12 activities). The global accuracy of classification reported in the reviewed studies varied between 65 and 100% (Table 2) and this did not seem to be related on the size of the different data sets studied (Additional file 4: Figure S4). The highest accuracies were obtained from accelerometer data (Figs. 5, 6), even though a good accuracy (> 90%) could be achieved using data collected from all types of devices (Fig. 5). Among all articles reviewed, the performance of our classification (98.46%) based exclusively on acoustic data appeared very high and demonstrated that the activity budget of wild animals can be recorded and reconstructed exclusively from acoustic data.

Fig. 6
figure 6

Performance of automatic classifications of activity budgets as measured by the global accuracy, as a function of the number of activities in the budget, extracted from 61 reviewed classifications (54 articles, including our study). Symbols indicates the type of animal-borne devices used to remotely record the behaviour of study animals and the full red circle indicates the values obtained in our study. Number of activities are all integers, but a random horizontal offset was added for the figure display to limit overlap of points. GPS global position systems. Other devices deployed concomitantly to accelerometers included GPS, gyroscopes, magnetometer, pressure sensors and acoustic recorders [26, 32, 52,53,54, 68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115]

Ultimately, the potentially most important difference among the different types of devices in terms of data yield might be the nature of other types of information provided, in addition to the animal’s activities themselves (Table 2). Accelerometers have been used to reconstruct the energy budget associated with different activities; GPS devices provide information on the geographical position and distribution of the animals; pressure sensors provide information on the diving profiles of aquatic species. In comparison, acoustic recorders provide information on all the sounds surrounding an animal: the biophony (including vocalisations from the equipped animal, its conspecifics, but also heterospecifics), the geophony (all natural but non-biological sounds related to the habitat), and the anthropophony (human-generated sounds).

Discussion

The different activities undertaken by our study animals were associated with distinguishable sets of acoustic features. They could then be automatically identified from sound data exclusively, with very good accuracy (98.5% global accuracy). Although the performances varied per class (i.e., the three main activities, floating on water, flying, and diving), the precision was consistently very high (95.5–99.4%, n = 3 activities) showing that the activities could be predicted with high confidence, especially if studied as percentage of time spent in each of the activities. Our results compared favourably to those of other studies using acoustic data to infer behaviour [52,53,54] and compared very well to all previously published studies that automatically classified activities based on animal-borne devices (Fig. 6). Interestingly, our results based on acoustic data showed a higher classification performance compared to a previous study classifying the same activities on the same study species based on speed and turning angles derived from geographical location data (92.3% global accuracy, 91.8–94.8% precision [32]). In addition to high predictive performances, acoustic devices provide additional information on the surrounding biophony, geophony and anthropophony that can be used to contextualize the observed behaviours. They thus appear a valuable alternative to other devices for the monitoring of animal’s behaviours.

By inferring the behaviour of birds from acoustic data, we were able to estimate the time-activity budget of breeding Cape gannets during their foraging trips. Our estimations are comparable with previous studies on the same species, with Cape gannets always spending proportionately more time on the water than flying: 64% and 35% (this study), 58% and 41% (breeding season 2001–2002 at Bird Island in Lambert’s Bay, based on three-dimensional accelerometry data [116]), 68% and 31% (breeding season 2012–2013 at Bird Island in Algoa Bay, based on geographical location data [117]), respectively. The number of dives predicted in our study was also within similar range compared to previous studies: 23–174 (this study), 10–110 (breeding season 2012–2013 at Bird Island in Algoa Bay, based on time-depth recorders, [118]), 12–218 (breeding seasons 2012 and 2014 on Malgas, based on time-depth recorders, [57]).

Various devices are available to remotely record an animal’s behaviours and activities. Our systematic review showed that accelerometers are the devices most commonly used for this purpose, even though a good accuracy of classification can be obtained from a range of devices. The weight of devices did not appear to be the most limiting factor, since all types of devices can be found at a relatively small size (< 20 g, the smallest device being an accelerometer at 2 g). Otherwise, the sampling frequency of the different types of devices might also be an important factor, since our results suggest that a higher sampling frequency may provide access to a higher number of recorded activities, and thus a more detailed description of the animal’s behaviours. In this respect, the most limiting device would be the GPS, and the device with the highest potential would be the acoustic recorder. Ultimately, if technical aspects can be overcome (e.g., deployment techniques and weight of devices, data analyses and classification algorithm using recent machine learning techniques), our systematic review suggested that the most important factor to be considered when choosing a device for recording an animal’s activities should be access to additional information. Indeed, if all types of devices can provide a good accuracy of classification on the animal’s activities, they all record different variables. As a consequence, they each provide additional information on different aspects related to the animal’s behaviours. Accelerometers record the fine-scale movements of animals in three dimensions, and thus provide details on movement related activities [48, 116, 119]. In addition to behavioural activities, accelerometers can be used to measure the energy expenditures of animals during different activities and thus allow for reconstructing dynamic energy budget models [47]. Time-depth recorders are best adapted for aquatic animals by providing detailed information on their diving behaviour [40, 120, 121]. In comparison, acoustic recorders do not measure the displacement or body movement of animals directly, yet our study proved that they can be used alone to reconstruct the activities of animals with very high accuracy that are comparable to what is obtained using other devices, such as accelerometers. In addition, acoustic recorders simultaneously record the biophony, geophony and anthropophony in the environment of equipped animals, and thus provide a large diversity of other information that can be essential to interpret the animal behaviours in a meaningful way. The physiology (heart rate) and the breeding behaviour (hatchling sounds in a burrow) of some species can be recorded remotely using acoustic devices [122]. The surrounding environment of equipped animals is also recorded and could help contextualize specific behaviours [52]. The vocalizations of equipped animals allow the study of variations in social interactions and grouping behaviours in different contexts [123, 124]. Furthermore, multi-species associations can be recorded. For example, in our data set, we recorded dolphin whistles underwater during some of the dives performed by equipped Cape gannets (data not shown). We could imagine that interactions between seabirds and fisheries or human marine activities could be recorded as well. Similar information on the surrounding context of animals can also be obtained using animal-borne video cameras [125,126,127], but in comparison acoustic recorders are much smaller in size and weight (which can be crucial for deployments on wild animals), they can record continuously for a much longer duration, and they record sounds from all directions, where cameras are limited by their field of view. Ultimately, combining different recorders may help reconstruct a more comprehensive understanding of animal behaviour in their natural environment [42, 53, 54], as long as this is done without compromising the welfare and behaviour of the study animals [128].

Conclusion

This article demonstrates the use of animal-borne acoustic data alone to automatically infer the activities of wild elusive animals with high accuracy. In addition to animal’s activities, acoustic recorders provide information on the surrounding environment of equipped animals (biophony, geophony, anthropophony) that can be essential to contextualize and interpret the behaviour of study animals. They, therefore, show promise to become a valuable and more regularly used alternative to the set of devices used to record animal activities remotely.

Materials and methods

Data collection

Fieldwork took place on Bird Island (Algoa Bay, South Africa) during December 2015. We deployed twenty devices (details below) on chick-rearing Cape gannets to record their behaviour while foraging at sea. Four individuals were randomly selected for manual identification of activity and model training. The trained model was then applied to automatically predict time-activity budget on the data, where the entire foraging trip was recorded, which comprised of another six individuals (trips not recorded in full resulted from progressive water damage).

Deployment procedure

Birds on departure to sea were captured near their nest using a pole with a hook on the end. Only one parent was captured per nest and devices were attached for one foraging trip only (usually 1–2 days), while the partner was on the nest guarding the chick. Nests were then monitored every hour from sunrise to sunset, and the deployed birds were captured again soon after their return to the colony and the devices were retrieved. Birds were handled for 8 and 6 min on average for the first and second capture, respectively. The handling procedure consisted of attaching devices (using adhesive tape, Tesa, Germany) and measuring the bird's body mass for the first capture (average 2580 g, n = 10 birds, measured with Pesola, Baar, Switzerland, precision 50 g), and retrieving devices and taking standard measurements (not used in this study) for the second capture. Acoustic recorders were deployed in combination with a GPS (global positioning system) device on eight birds (total mass 60 g, 2.3% of bird body mass), a GPS and a video camera on one bird (90 g, 3.4% of bird body mass), or a time-depth recorders and a video camera on 11 birds (80 g, 3.1% of bird body mass). The devices had no significant effect on the duration of foraging trips, when compared between equipped and non-equipped birds (for details see [124]), so normal behaviour was assumed. Only the data from the acoustic recorders were used in this study.

Acoustic recorders

Audio recorders (Edic-mini Tiny + B80, frequency response 100 Hz–10 kHz ± 3 dB, 65 dB dynamic range, TS-Market Ltd., Russia, fitted with a CR2450 battery, 16.2 g, autonomy estimated for ~ 50 h at 22 kHz in our study, and provided for 190 h at 8 kHz by the manufacturer) were set up to record sound in mono at a sampling frequency of 22.05 kHz. They recorded continuously, hence collecting data during the whole foraging trip of the birds. The main challenge for collecting such acoustic data was to ensure high quality recordings on board a flying and diving bird. To limit disturbance from the wind, we placed the audio recorder on the lower back of the bird, under feathers and facing backwards. In addition, a thin layer of foam was added after the first deployment to reduce flow and background noise. We sealed the microphones in nitrile glove materials (amplitude attenuation of 6 dBSPL both in the air and in the water, no modification of the frequency response, as measured in the laboratory) to keep the devices sufficiently dry when immersed in the sea water but still ensure good quality sound recordings (avoiding thick waterproof casing).

Manual identification and label of activities

The activities of Cape gannets when foraging at sea were manually identified on a subset of our data set (henceforth referred to as “labelled dataset”). The data retrieved from four deployed Cape gannets were randomly selected, comprising of ~ 33 h of recordings. Based on previous work with observations from bird-borne video cameras [67] we identified three different main activities: floating on the water, flying, and diving. Those three activities are associated with different sounds that can clearly be identified by a trained human ear (Fig. 1). When the bird is flying, the wind is usually loud and the wing flapping can sometimes be heard. When the bird is on water the ambient noise is usually less, sometimes with water splashing sounds. The take-off is distinguishable with loud flapping at a high rate. Gannets dive in the water at high speed, up to 24 m s−1 [129] so they enter the water with a loud impact noise, often saturating the amplitude of recording. Coming out of the water is also usually loud with sounds of rising bubbles. To manually label these data, the spectrograms of the selected sound data were visually observed and the sound was played concomitantly using the software Avisoft-SASLab Pro (version 5.2.09, Avisoft Bioacoustics, Germany). A total of 318 events “floating on the water”, 391 events “flying” and 243 events “diving” were identified and labelled. Those labelled data were then used to characterize the acoustics properties of each activity and to train the classification algorithm (using a cross-validation procedure, details below).

Characterization of activity from acoustic features

To characterize the bird’s activity from the sound recordings, an automatic feature extraction was applied. For each sound recording, the algorithm followed four steps. First, the sound data were downsampled at 12 kHz. Second, to remove low frequency acoustic noise, the sound recordings were high-passed filtered (above 10 Hz) using a second-order Butterworth filter. Third, the recordings were divided into small sound segments of ~ 1.4 s (corresponding to 214 samples). This segment length was chosen to reflect the dynamic of movement of our study species. In particular, the dives last on average 20 s (minimum 6 s) and always start with an ‘entering the water’ that displays very specific sound features (Fig. 2) and lasts 1–2 s. A segment length of 214 (corresponding to 1.4 s) thus appeared most appropriate. The algorithm was also tested using segment lengths of 213 (0.68 s) and 215 (2.76 s) and they led to similar results (not shown). Fourth, a set of temporal (n = 21) and spectral (n = 14) features were extracted from each sound segment to acoustically describe the activities. Temporal features included envelope features, such as root mean square (RMS), peak to peak and peak to RMS values (means and standard deviations), and also signal skewness, kurtosis, entropy, quantiles and zero crossing rate. Spectral features were computed from the power spectrum (Fast Fourier transform) and included dominant frequency features (dominant frequency value, magnitude, ratio to the total energy, bandwidth at − 10 dB, spectral centroid and spectral flatness (the two latter computed as per [130]) in addition to quartiles of energy and the ratio of energy above three fixed thresholds (300, 1500, 5000 Hz). All acoustic features were computed using Matlab R2019b custom scripts.

The three main activities were re-defined into five categories: floating on the water, taking-off (three first segments of flying when preceded with floating on water), flying, entering water (first segment of diving when preceded with flying), and diving. The two transition classes were used for the ‘revision algorithm’ as described in the following section (“Classification procedure”).

Classification procedure

The labelled data set was used to train and test a classification algorithm following a fivefold cross-validation procedure. Briefly, this procedure consisted of splitting the data set into a training set containing 4/5 of the data to train the algorithm, and testing it on the remaining 1/5. This partitioning of the data into training and test set was done five times, and performances of the algorithm on the test sets were averaged over those five replication.

Five types of supervised learning algorithms were tested (Decision trees, Discriminant Analysis, Support Vector Machines, Nearest neighbour classifiers and ensemble classifiers), with some providing high classification results (above 90%). Among them, the k-nearest neighbour model was finally chosen, because its ratio between true and false positives for the diving class (of highest interest in our study case) was higher than that in other algorithms, still with a similar global accuracy. The k-nearest neighbour algorithm was implemented with five neighbours, Euclidian distance as distance metric and equal distance weight.

In all tested models, each sound segment was considered as independent from each other. As a strong dependence exists (for instance, Cape gannets do not fly just after diving without transitioning on the water), a ‘revision algorithm’ was applied subsequently to the results of the classification procedure. First, ‘entering water’ segments were used to confirm a dive event or deleted if no dive segment was following the entering water segment. A similar procedure was used with the take-off and flying segments. Then, transition segments were merged into their corresponding class (entering water was relabelled and merged with its associated diving event, similarly for take-off merged with flying). Isolated segments (defined as segments of one type occurring within a 6-segments long window of similar label segments) were removed and relabelled so that a coherent 6-segments long window of unique event was kept (Fig. 1C, D). Finally, predictions were smoothed using a moving median 6 segments-long window (corresponding to ~ 8.42 s) to further reduce the rapid changes in the class of segments predicted over short duration and thus improve the prediction of events.

All algorithms were implemented using Matlab R2019b and the Statistics and Machine Learning toolbox. Four metrics were used to assess the accuracy of prediction: the global accuracy (total number of segments correctly classified divided by the total number of segments), the sensitivity (also called recall or true positive rate) which measures the proportion of True Positives that are correctly classified), the precision (also called positive predictive value) which measures the ratio of the True Positives over all Positives), and the “Informedness” (the multi-class equivalent of the Youden’s index) which is a summarised performance measure of sensitivity and precision indices for all classes.

Precision = TP/(TP + FP) and Sensitivity = TP/(TP + FN), where TP stands for True Positive, FP for False Positive and FN for False Negative.

Application: acoustic-based time-activity budget of Cape gannets

The classification algorithm was applied to unlabelled acoustic data to predict the activities of Cape gannets when foraging. Only the data with full foraging trips were kept at this stage. These included six new individuals, plus one individual for which part of the data was labelled and used in the trained model. The activity of birds was then predicted on a total of ~ 93 h of acoustic recordings. The time-activity budgets (based on the number and duration of events) of unlabelled trips were computed by grouping successive segments (1.4 s) of similar activity into ‘events’ (see, for example, Fig. 1D). For instance, a 7-s period of diving, corresponding to 5 continuous time segments labelled as diving, was considered as one diving ‘event’.

Systematic review

To place our study into perspective and discuss the use of acoustic recorders among the different devices available for remotely recording and inferring behaviour, we conducted a systematic review on articles that automatically classified activities from animal-borne devices. We searched for articles in a systematic, repeatable way, using the ISI Web of Science Core Collection database. Our search included articles in English from 2000 to 2021, and was based on the following keywords:

(((((((((TS = ((“time budget*” OR "time-budget*" OR “activity budget*” OR "activity-budget*" OR “time-activity budget*” OR “state budget*” OR “behavio*ral state*” OR “behavio*r-time budget*” OR “behavio*r* classif*” OR “behavio*r discrimination” OR "behavio*r* categor*" OR "scene-classif*") AND (recorder* OR device* OR tag* OR biologging OR bio-logging OR logger* OR datalogger* OR biologger* OR bio-logger* OR collar* OR sensor* OR "animal-borne" OR "animal borne") AND (behavio*r*) AND (classif* OR accuracy OR “machine-learning” OR “machine learning” OR “supervised learning” OR “feature learning” OR "infer* behavio*r*"))))))))))).

On the 5th of April 2021 this query resulted in a list of 202 articles. These articles were first checked for relevance to our scope: use of animal-borne devices on non-human animals to record and infer activity budgets, training of an automatic classification (supervised learning algorithm) on data with direct observation (visual or video recorded) and with a quantification of algorithm performance. This resulted in a final list of 54 articles from which information was extracted. If several classifications were performed in an article (data from different devices, or classification on different animal’s activities), one line of data was extracted for each classification. The information extracted included: the species studied, a categorisation of the species (flying, terrestrial, aquatic), the number of individuals equipped, the devices attached on animals (all devices, the ones used to infer activities, the ones used to train and validate classification), the weight of devices (as a mass and as a percentage of the animal’s body mass), the size of the data set (as a number of data points), the sampling frequency, the number of activities, the list of activities, the algorithms used, the global accuracy obtained, other performances (when provided), the percentage of data used for training, the use (or not) of a cross-validation procedure. The entire data table can be found in the Additional file 6: Table S2 from which we extracted information provided in the main text. We identified five categories of devices used: accelerometers alone, accelerometers combined with other devices, GPS devices, acoustic recorders, and pressure sensors. We then compared the global accuracy obtained by the different studies, as a function of the type of devices used to infer activities and the number of activities in the budget. We acknowledge that the measure of global accuracy is limited and does not inform fully on a classification performance. In particular, this measure does not inform on the performance for the different behaviours and can hide a poor performance on rare behaviours (which are often of higher interest in biology and ecology). However, the measure of global accuracy is the most standard performance measure used, and was the only one that we could extract from (almost) all reviewed articles to allow comparison.