A simple tool for linking photo‑identification with multimedia data to track mammal behaviour

Identifying individual animals is critical to describe demographic and behavioural patterns, and to investigate the ecological and evolutionary underpinnings of these patterns. The traditional non-invasive method of individual identification in mammals—comparison of photographed natural marks—has been improved by coupling other sampling methods, such as recording overhead video, audio and other multimedia data. However, aligning, linking and syncing these multimedia data streams are persistent challenges. Here, we provide computational tools to streamline the integration of multiple techniques to identify individual free-ranging mammals when tracking their behaviour in the wild. We developed an open-source R package for organizing multimedia data and for simplifying their processing a posteriori—“ MAMMals : Managing Animal MultiMedia: Align, Link, Sync”. The package contains functions to (i) align and link the individual data from photographs to videos, audio recordings and other text data sources (e.g. GPS locations) from which metadata can be accessed; and (ii) synchronize and extract the useful multimedia (e.g. videos with audios) containing photo-identified individuals. To illustrate how these tools can facilitate linking photo-identification and video behavioural sampling in situ, we simultaneously collected photos and videos of bottlenose dolphins using off-the-shelf cameras and drones, then merged these data to track the foraging behaviour of individuals and groups. We hope our simple tools encourage future work that extend and generalize the links between multiple sampling platforms of free-ranging mammals, thereby improving the raw material needed for generating new insights in mammalian population and behavioural ecology.


Introduction
Natural populations change in size and composition, propelling the dynamics of ecological communities, species interactions, and energy flow through the ecosystem (Odum and Barrett 1971).At the heart of these changes, are individual animals being born, growing, behaving, and dying.Individual-based data provide the raw material to investigate the mechanics and dynamics of these natural populations, their ecological and behavioural interactions and evolution (Coulson 2020), which is particularly necessary in longitudinal studies (Clutton-Brock and Sheldon 2010).Therefore, a deep understanding of these patterns and processes in animal ecology requires identifying and tracking individual animals over time and space (Coulson 2020).
The available invasive and non-invasive methods for sampling individual animals present trade-offs on the accuracy, content and quality of the data they provide.Invasive methods require capturing animals to mark (e.g. with collars, tattoos, tags, freeze branding; Silvy et al. 2005) or fit tracking devices (RFID, GPS, acoustic, satellite tags: e.g.Krause et al. 2013) but provide detailed information about the individuals (e.g.identity, location, behaviour, health).Actively capturing and marking animals, however, can be unfeasible, expensive or disrupt natural behaviour or physiology (Walker et al. 2012).By contrast, non-invasive identification methods, such as photographic, acoustic and video recording (Karczmarski et al. 2022a, b), rely on systematic comparison of natural marks or behaviours (e.g.Karanth and Nichols 1998;Muller et al. 2018;Longden et al. 2020) to track individuals from a distance (e.g.Clapham et al. 2020;Ferreira et al. 2020).Although efficient in providing individual identities, non-invasive methods generally provide fewer information on other biological variables (but see Toms et al. 2020), which has motivated the simultaneous use of other multimedia sampling platforms, such as video (e.g.Raoult et al. 2018;Francisco et al. 2020;Landeo-Yauri et al. 2020) and audio recordings (Cheng et al. 2012;Erbe et al. 2020).Novel technologies for identifying and tracking individuals using such multimedia data are becoming increasingly more precise in the lab or captivity (e.g.Mersch et al. 2013;Dell et al. 2014;Pérez-Escudero et al. 2014;Alarcón-Nieto et al. 2018;Graving et al. 2019;Marks et al. 2021), but doing so in situ remains more challenging (e.g.Ferreira et al 2020;Guo et al. 2020).In the field, where animals are not spatially constrained, recording data from multiple sampling platforms simultaneously, or syncing large volumes of data to then link with that of individual identification a posteriori, can be troublesome.
In wild mammal research, cetacean studies exemplify the continuous development of non-invasive individual identification methods based on multimedia data.Photo-identification has been the go-to technique to recognize individual whales and dolphins in the last five decades (e.g.Würsig and Würsig 1977;Katona and Whitehead 1981;Hammond et al. 1990).Since whales and dolphins can range over large areas and spend long times underwater, photo-identification has been increasingly coupled to other multimedia sampling to detect the presence of individuals and/or describe their behavioural patterns.For instance, while cameras and acoustic sampling provide invaluable underwater perspectives, the growing market of unmanned aerial vehicles (drones) has popularized the recording of behaviour, movement and health of cetaceans from an overhead view (e.g.Torres et al. 2018;Gray et al. 2019;Hartman et al. 2020).With few exceptions, however, these sampling techniques do not provide individual identities-but see, e.g., identification from overhead images (e.g.Payne et al. 1983;Durban et al. 2015) or acoustic signals (e.g.Janik and Sayigh 2013).Combining traditional photo-identification sampling with hydrophones, underwater and drone cameras can resolve this limitation, but it inevitably creates another one: individual behavioural tracking from multiple platforms generates a large and multidimensional dataset that rapidly become unfeasible to handle manually.These technological advances have therefore produced a need for corresponding advances in computational tools to organize and process multiple data streams (e.g.Schneider et al. 2019).
Here, we introduce a free and open computational tool for aligning, linking and syncing photo-identification data with other multimedia data of free-ranging vertebrates.The R package MAMMals-Managing Animal MultiMedia: Align, Link, Sync-contains functions to synchronize different multimedia data streams a posteriori and so facilitate their postprocessing to measure relevant biological and behavioural data.Using MAMMals, one can (i) extract, organize and line up the metadata of photographs, videos, audios, drone flight logs and any other timestamped text data; (ii) select, trim and export clips or stills of the footage or audio recording containing individual photo-identification; and (iii) wrangle, convert and plot data from cameras, drones, hydrophones, microphones and other timestamped data sources.In what follows, we describe the workflow for pre-processing individual photo-identification and link it to other multimedia data (Fig. 1).Next, we illustrate the utility of these tools by applying them to process and analyse empirical data on the foraging behaviour of coastal bottlenose dolphins.We conclude by discussing the caveats of our approach and how future work can alleviate them.

Workflow overview: coupling photo-identification with other multimedia data
The MAMMals R package targets the challenge of coupling large volumes of observational and multimedia data to traditional techniques of identifying individuals, extending therefore the possibilities for studies that use methods of focal-animal and focal-group sampling (Altmann 1974).The minimum requirements are image files with assigned individual identification and at least one other multimedia data source.The workflow follows four steps (Fig. 1): (i) extracting the metadata of photographs and any other multimedia files available; (ii) aligning the metadata of these files to select the useful multimedia containing photo-identified tion from pre-processed data directly to the function getPhotoMetadata.c The second step is to align the metadata of photographs (or timestamped field notes) with that of the other media to automatically select the video or audio files containing individual photo-identification data.d The third step is to link the selected media by clipping the videos and audios around the information of interest (e.g.photoidentified individuals) to facilitate the post-processing of videos (get-VideoClip), audios (getAudioClip) or stills from the video (getVide-oFrame).If sampling includes drone videos, selected media can be linked to information from the flight, such as latitude, longitude and altitude.e The final step is to sync media and/or text by subsetting only the time intersect between data coming from different sampling platforms.The synced multimedia and text data can be exported as a single merged file or multiple separate files individuals; (iii) linking these selected files by clipping the multimedia containing photo-identified individuals; (iv) and syncing media and text data around their intersection time.We detail each step of the MAMMals workflow in the next sessions, and provide instructions and examples of the input and output files in a documentation in an online tutorial (https:// mamma ls-rpack age.netli fy.app/ index.html).The MAMMals R package can be installed from the online repository (instructions at https:// bitbu cket.org/ mauca ntor/ mamma ls/).It depends on the installation of the R environment (R Core Team 2021) and key R packages such as lubridate (Grolemund and Wickham 2011) to manage date-time formats (full list of dependencies, see the package repository), as well as external software ExifTool (https:// exift ool.org) to extract the metadata of media files, and FFmpeg (https:// ffmpeg.org) to clip video and audio files.
To align, link and sync multiple data sources, the MAM-Mals workflow relies on timestamped files: essentially, the recording times of the multiple sampling equipment are extracted from the metadata of the media files and lined up.Therefore, the most important recommendations for field sampling are to synchronise the clocks of all collection platforms, and to keep the original metadata of the media files unaltered.For accurate results, we recommend the clocks of cameras, drones, audio recorders and auxiliary equipment (such as cell phones used to pilot the drone or tablet apps to record observation data) to be adjusted to the maximum precision possible via information-either from the GPS satellite, or manually-and to be always double-checked and fine-tuned before each sampling occasion to account for clock drift.For example, one can photograph and film a reference clock prior to sampling or use audio or visual signals during sampling (e.g.flash in our case study detailed below) to offset time differences across images and videos.
When photographing animals for individual identification using natural marks, we recommend following the protocols for collecting, processing and organizing such data, which have been extensively detailed elsewhere (e.g.Speed et al. 2007;Urián et al. 2015).We highlight that using DSLR cameras equipped with GPS and digital compass can be useful when teasing apart photo-identified individuals in the field, especially when tracking them with overhead videos.For instance, when tracking multiple individuals or groups distributed in space, one can assign the photographs taken to each group recorded in the overhead footage by interpreting the GPS coordinates and shooting angle extracted from the photograph metadata.After the photographic data sampling, we recommend first processing the photo-identification data and organizing it in a plain text data frame, in which the first column contains the photograph file name and extension (e.g.'6Q1A8164.JPG'), and the second contains the individual (alphanumeric) identification code (e.g.'ID1248').
When recording audio, we recommend using recorders that can produce timestamped files.Otherwise, one can manually check the end time of recordings after sampling and rename files accordingly with date and time.When recording videos from small drones (e.g.DJI Phantom, DJI Mavic Pro, DJI Inspire, Splash Drone) while simultaneously collecting photo-identification or audio recordings, we recommend keeping a constant flight height and point the camera straight down (i.e., drone and camera pitch = − 90°, roll = 0°) to ensure the centre of the frame matches the coordinates recorded by the drone GPS and to reduce the distortion from any measures taken from the drone footage.If measuring animals from the drone footage using photogrammetry, there will be additional requirements.In addition to the camera tilt, the aircraft altitude data are the main issues for precise and unbiased photogrammetric measurements.Off-the-shelf drones record the altitude relative to the aircraft's take-off position ("home point").Hence, if the aircraft takes off from the deck of a ship or a higher ground, the zero in the aircraft's barometer does not match the sea level.To mitigate this, an object of known length can be used to calibrate a scale (details in Burnett et al. 2019).Another solution is to couple a LiDAR sensor to the drone (e.g.Dawson et al. 2017) to precisely measure the distance from the aircraft to the sea level.Correcting lens distortion and camera calibration also reduce errors in measurement estimates (see Dawson et al. 2017).
Step 1: extracting metadata of multimedia files (e.g..csvor .txt)and to then assign individual ID to the database using any text editor or spreadsheet software (e.g.Microsoft Excel, Apple Numbers).Bear in mind, however, that issues with the date and time formats and precision are common when using spreadsheet software; thus, we suggest using plain text editors to avoid lack of precision when aligning, linking or syncing the photo-identification to the multimedia data.
For the audio subfolder, use getAudioMetadata to extract metadata of audio files (at least duration, initial and final time).If the audio files do not contain date and time in the metadata, initial and final time of recordings can be extracted from the filename automatically generated with date-time stamps, as exported by commonly used autonomous recorders (e.g.Whytock and Christie 2017;Hill et al. 2019).To extract the metadata of the videos (at least duration, initial and final time), access the video subfolder with the getVideo-Metadata.If videos were recorded with drones, additional metadata can be available (e.g.altitude, GPS coordinates) and will be extracted and organized into a text database as well.Most commercially available drones save detailed logs of every flight.Information on aircraft sensors, motors, battery, remote controller and media are logged on-board and on remote applications, often using proprietary file structures.Hobbyists (e.g.DatCon, TXTlogtoCSVtool), companies (e.g.https:// ta.com) and forensics (e.g.Clark et al. 2017) have been developing tools to decode flight logs into readable .csvfiles.Alternatively, the MAMMals R package can extract the basic flight log data recorded by DJI drones.These drones can produce timestamped subtitles (1 Hz data) logging the aircraft latitude, longitude and height (calculated from the aircraft barometer), the home point latitude and longitude, and camera settings.However, subtitles do not contain auxiliary information on the aircraft and camera roll, pitch and angle; and the accuracy of latitude and longitude is limited to 10 m.But conveniently, subtitles are natively exported from DJI drones as text files (.srt) along video files, and the MAMMals readSRT function can read all .srtfiles in a folder and return an R data frame with the formatted metadata of the DJI drone flight logs.
Step 2: aligning multimedia files After extracting the metadata of the multimedia files, large volumes of multimedia data can be aligned with the MAM-Mals functions that subset media files containing photoidentification data (Fig. 1c).Use the selectVideos or selec-tAudios functions to get the video and audio files of interest, respectively, by aligning their metadata with the metadata of the photographs of individuals (previously generated by the functions getVideoMetadata, getAudioMetadata, get-PhotoMetadata, respectively).The select set of functions calculates the time of the photograph in the video or audio files for all photographs taken during the sampling event, and they return an R data frame with data matching the time in the video or audio files.Then, one can export an R data frame containing only the photo-identified individuals, or other events of interest, into a .csvor .txt.We highlight that while these functions are based on photograph metadata, they also work with other text data in which events are correctly timestamped (Fig. 1c), such as observed behavioural events recorded in the field notes, and GPS positions from loggers fitted to the animals.
Step 3: linking photographs with multimedia data After aligning the metadata of the media files, the photoidentification data can be linked with video or audio files by trimming these media files (Fig. 1d) based on the information generated by the selectVideos and selectAudios functions.If the aim is to get a still from the video for every photo-identified individual, the getVideoFrame function can export a frame of the video in the moment each photo was taken.If the aim is to perform further video or audio analyses, one can export short clips around the time of each photoidentification for both video (getVideoClip) and audio files (getAudioClip).If sampling with drones, one can automatically link data from flight logs to every event exported by the selectVideos or selectAudios functions.The linkFlight-ToMetadata function returns an R data frame in which the number of lines is equal the number of photo-identification photographs, and the columns contain all available metadata.The linkMetadataToFlight function merges the media data with the flight data, returning an R data frame with all the flight logs, or a list with a data frame for each flight log data.
Step 4: syncing multimedia data Finally, the multiple media data sources can be synchronized based on the intersection of their recording time (Fig. 1e).Using the function syncMedia, video and audio files that were sampled concurrently and selected by the selectVideos and selectAudios functions can be trimmed to match the time intersection, and merged into a single file or exported as separate media files.Other auxiliary text data (e.g.GPS trackers, heart rate loggers, flight logs) recorded simultaneously in the field can be synchronized based on the intersection of their sampling time and merged into a single text database using the function syncData, as long as the input clocks are precisely synced.

Auxiliary functions for post-processing multimedia data
The MAMMals R package was designed to streamline the pre-processing of photo-identification and multimedia data; thus its workflow does not include the post-processing of the biological data of interest.After linking the photographs with the useful parts of the videos and audios, manual or semiautomatic extraction of the target data is required.This may include video playback to quantify behavioural states and events (e.g.Torres et al. 2018), morphometry or health variables (e.g.Christiansen et al. 2020); automatic detection of species (e.g.Gray et al. 2019); or photographic comparison needed to identify individual animals (e.g.Urián et al. 2015).To efficiently measure and extract such biological data from photos, videos, and audio data, we point the reader to the growing number of computational tools available elsewhere (e.g.Abràmoff et al. 2004;Friard andGamba 2016, Beery et al. 2020;Schneider et al. 2018;Torres and Bierlich 2020;Bird and Bierlich 2020).We exemplify one case of post-processing behavioural data in the next section, but here we highlight that the MAMMals R package also contains some functions and utilities to assist with the post-processing of the linked multimedia data or auxiliary data.For instance, one can use MAMMals to wrangle and convert information from the drone flight log data, such as gimbal and camera angles, GPS coordinates, digital compass and barometer sensors.We conceptually divide these functions into data tools and visualization (Table 1), which are, respectively, identified by the prefixes do and view.For instance, doCorrectAngle can be used to correct drone yaw ranging from 0 to 180 or − 180 to 0 to 0 to 360, and the function viewFlightPath can be used to visualize a 2D drone flight path with photos as points, using data from the linkMetadataToFlight.

An illustrative case study
To illustrate the utility of the MAMMals R package, we used individual and behavioural data collected from a coastal bottlenose dolphin population in Laguna, southern Brazil, where some individual dolphins forage near the coast with net-casting fishers (Simões-Lopes et al. 1998).To explore the dolphins' foraging behaviour, we combined standard photo-identification with overhead video, recorded using a commercially available drone (DJI Mavic Pro) with a builtin high-resolution camera mounted on a gimbal.We hovered the drone over the study area above 60 m to minimize potential disturbance to the dolphins (Fettermann et al. 2019), and follow all safety flight guidelines (Fiori et al. 2017;Raoult et al. 2020).The drone camera covered an area of ca.7500 m 2 , including the coast where the fishers wait for dolphins and ca.60 m of the lagoon canal.Simultaneously, two photographers registered the dolphins' dorsal fins for posterior individual identification based on nicks, notches, scars and skin lesions, following photo-identification protocols (Hammond et al. 1990).One photographer positioned ashore used a DSLR Canon 60D camera equipped with a 100-400 mm lenses to photograph all dolphins in the video footage area, while the second photographer stood on a 1.5 m platform 3 m behind the fishers and used a DSLR Canon 7D MkII with built-in GPS and digital compass and a 70-300 mm lenses to identify the individual dolphins that approach the fishers to interact.This photographer was always captured in the drone footage and used a flash (Yongnuo) pointing up, so the timing of the photographs taken could be verified in the video to double-check if the clocks of the camera and drone were properly synced.
To illustrate two types of behavioural data that can be measured from the merged video and photo-identification dataset, we tracked (i) the foraging behaviour of individual dolphins in terms of distance and heading angles relative to the coast over time (Fig. 2a); and (ii) the foraging behaviour of a group of dolphins in terms of spatial cohesion and diving synchrony (Fig. 2b).In both cases, we used the MAM-Mals R package to automatically select examples of drone videos containing photo-identified dolphins from a total of Filter drone data to when the camera is pointing straight down, i.e. pitch = − 90°d oCalcDistanceX and doCalcDis-tanceY Calculate horizontal (doCalcDistanceX) or vertical (doCalcDistanceY) distance of an object marked in any image tool (e.g.ImageJ).This distance will be used to transform the marked object into a GPS position using the origin point as reference doNewLatitude Calculate latitude of an object given the distance and angle to a reference point using photogrammetry data doNewLongitude Calculate longitude of an object given the distance and angle to a reference point using photogrammetry data doAngleToDec Convert the degree-min-sec format (e.g.28° 29′ 44.77″ S) into degree decimals doCalcRadiusEarth Calculate radius of Earth at any given latitude and altitude viewFlightPath Plot drone flight path and relate with photo-identification data, if available 56.6 h of footage and 3614 photographs of 21 identified individual dolphins.First, we used the functions getPhoto-Metadata and getVideoMetadata to extract and organize the metadata of photographs and videos, extracted the drone flight logs and used some of the auxiliary functions to correct angles of drone footage (doConvertAngle, doCorrect-CameraYaw) and filter off flights that were too low (doFil-terDroneHeight) or in which the camera was not pointed straight down (doFilterGimbalPitch).
To describe (i) the individual-level foraging, we then used the function selectVideos to identify drone videos taken when there were 1 or 2 dolphins at the interaction site, and the function getVideoClip to crop 6-min video clips around the photographs taken.Next, we manually processed these clips with the open-source software imageJ (Abràmoff et al. 2004); each time the photo-identified dolphin surfaced to breath, we used the 'straight line' tool to measure the distance of the dolphin from shore, and the 'angle' tool to measure the angle between the dolphin's heading and the shore.In videos with more than one dolphin at the site, we distinguished photo-identified dolphins recorded in the video at the same time but in different places using the angle (available in the metadata of the photographs) between the dolphin and the camera equipped with built-in compass used for photo-identification. Finally, we converted the distances measured in pixels to meters based on a 1-m scale captured Fig. 2 Examples of individual-and group-level behaviour of photoidentified mammals extracted from overhead videos.a Tracking the foraging behaviour of individual coastal dolphins, in terms of distance and angle to the shore.The MAMMals package was used to automatically select and clip a video containing a solitary photoidentified dolphin (inset photo-identification).The video was then post-processed, when dolphins' distances (yellow lines in the picture; y-axis in the plot) and angles (cyan lines in the picture, with the middle point centred on the dolphin; arrows in the plot, whose colours indicate temporal sequence) relative to shore were measured each time they surfaced to breath.Distances measured in pixels were converted to meters based on a 1-m scale placed behind the photographer; angles measured in degree relative to the shore, were converted to radians, considering the True North as a reference.b Group cohesion and dive synchrony of photo-identified bottlenose dolphins, in terms of relative distance to each member and timing of surfacing.
The MAMMals R package was used to select the photographs with the dolphins' dorsal fins for posterior identification of the 5 group members.The group of 5 dolphins were then tracked over time with a custom computer vision model trained to detect dolphins in drone videos.Cohesion was estimated as the average Euclidean distances among the centroids of all dolphins detected (i.e. the green rectangles with detection scores) every 0.2 s and converted to meters using a known 1-m scale captured in the video (not shown here).Synchrony was estimated as the time difference between detections.Pictures 1-4 illustrate a case of diving sequence of a subgroup of 5 dolphins, in which 1 individual is detected first, followed by three that surfaced simultaneously, and then by the fifth individual after a 2-s lag.Box plots present the distribution of mean distances and breath intervals (y-axes) across different number of simultaneous detections (circles) of dolphins at the surface (x-axes) during a ~ 20-min drone video in the drone video, and converted the angles measured in degrees relative to the shore to radians, considering the True North as a reference.In Fig. 2a, we present an example of these data on the distances and angles of a photo-identified individual dolphin foraging close to shore.
To describe the group-level foraging, we used the functions selectVideos and getVideoClip to select the photographs of all dolphins foraging in groups and trim the complete 20-min drone video into a shorter video around the time that the photos were taken.We first photo-identified individuals manually, and then measured group cohesion and dive synchrony, in terms of relative distance to each member of the group and timing of surfacing.To do that, we have used a convolutional neural network object detection classifier (He et al. 2016) to automatically detect and count dolphins in the drone footage.We have re-trained a TensorFlow pre-trained classifier with Faster-RCNN model architecture (Ren et al. 2015) using 838 drone video frames in which dolphins were manually labelled using LabelImg (Tzutalin 2015), and 200 other such images for testing the model.We then applied this supervised learning computer-vision model to detect and count the number of dolphins at every 0.2 s of the drone video, i.e. every 5 frames of a 25 FPS video (for a similar approach, see Guo et al. 2020).We highlight that although we have used machine learning to post-processes the video clips, this procedure could also be done manually.For instance, one can extract short .aviclips with a framerate of 1 fps using the getVideoClip function, and then import the clip to imageJ to measure the inter-individual distances and surface timing.To estimate the group cohesion, we calculated the relative time between each dolphin detection, considering greater cohesion when individuals are closer together; to estimate diving synchrony, we calculated the lag between dolphin surfacing times, considering greater synchrony when their breath intervals were shorter.We measured the group cohesion as the average Euclidean distance, in pixels, between the centroids of all dolphins detected in each frame, and converted these distances into meters based on a known 1-m scale recorded in the drone video.We measured the diving synchrony as the time lag between detections, considered the group to be in synchronous diving when more than one dolphin was detected in the same video frame.In Fig. 2b, we present these data on group cohesion and dive synchrony as the distribution of mean distances and breath intervals among different number of dolphins at the surface.

Caveats
The tools herein presented assist the organization of simultaneous sampling methods, but caveats exist.First, the level of detail of the outputs-be them the merged databases or the cropped and synced media-may depend on the accessibility of the study system.We have illustrated how the MAMMals tools work when recording and tracking coastal dolphins, but these tools could be used to process multimedia of mammals individually identifiable from photographs taken from the ground or sea level (e.g.sperm and humpback whale caudal fins, or blue whale pigmentation; Hammond et al. 1990) and from overhead (e.g. head of right whales, or other identifiable body parts of marine and terrestrial mammals; Landeo-Yauri et al. 2020;Maeda et al. 2021).However, in our example, we had the advantage of keeping the photographer in the overhead video frame at all times for recording the position of the GPS-equipped camera as a reference point, and for doublechecking the synchrony between the video and photograph data streams.This setup is rather unusual for studies of free-ranging mammals, and require the sampling design to be adapted to fit the reality of other study systems.For example, boat-based focal follows of cetaceans could aim to keep the boat close to the group most of the time to allow the photographer to be in the overhead video frame, or overhead behavioural sampling of terrestrial mammals can be focused on a relatively small open area.
The second limitation of our tools is that the precision of the link between the photo-identification and the other multimedia can be dependent on group size and group cohesion.In our example, we tracked solitary and small groups of animals that can be easily photo-identified, but mismatches in individual identification can occur when collecting data from multiple individuals at the same time, such as in large and tight groups.Our drone videos can contain multiple individuals, leading to the possibility that an individual photographed at a given time could be linked to multiple individuals that appear in the drone video at that time.We have resolved this by keeping the photographer in the overhead video frame and relying on the angle of the built-in digital compass of the camera to tease apart individuals in the overhead footage.However, these decisions become increasingly more difficult to make as the group size, and the rate of pictures taken, increase, and/or the groups become tighter and closer to the photographer.In such situations, our tools could still help defining the timestamps of sampling events to extract group-level (but not individual) data or identify subgroups of animals.

Closing remarks
Our tools to streamline the use of multimedia data with traditional individual identification methods are steps toward the integration of multiplatform behavioural sampling on free-ranging mammals.We acknowledge there is room for improvement and, to encourage further development of these tools collectively, we provide all the code of the MAMMals package in an open repository (https:// bitbu cket.org/ mauca ntor/ mamma ls/).We hope to inspire further collective work in the scientific community to generalize the process of linking multiple sampling platforms to refine the collection and processing of data of individual animals.More importantly, we hope these computer tools improve the raw material needed to promote new insights on the population dynamics, ecological interactions and behaviour of free-ranging animals.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http:// creat iveco mmons.org/ licen ses/ by/4.0/.

Table 1
Auxiliary functions provided in the MAMMals R package to assist data wrangling, conversion and visualization