Background

Animal-attached sensors are increasingly used to study the movement, behaviour, and physiology of both wild and domestic species. Such devices have provided valuable insights into areas such as bioenergetics, animal welfare, and conservation [1,2,3,4]. Data from animal-attached sensors are often overlaid on satellite imagery and maps to provide spatial and environmental context [5,6,7]. Current attempts to obtain more detailed visual information from a study animal’s perspective have typically involved attaching standard RGB cameras [8,9,10] (but see [11, 12]). Information is then often extracted from images or footage by manual inspection. Few attempts have been made to apply computer vision techniques to non-human biologging studies. Two notable examples include the use of object tracking on the prey of falcons and template matching to determine the head position of sea turtles, though in both cases using 2D images [13, 14].

In other fields, light-based depth sensors have been used for the reconstruction of 3D scenes, with numerous applications in areas such as robotics, mapping, navigation, and interactive media [15,16,17,18]. Active depth sensing encompasses a range of techniques based on examining the properties of projected light and enables the construction of 3D point clouds from points within a coordinate system. In a detection system known as LiDAR (light detection and ranging), this may involve measuring either the length of time or phase shifts that occur when light is projected from a source to an environment of interest and travels back to a receiver [19]. Alternatively, a defined pattern (e.g. stripes or dots) can be projected, providing information on depth and the surface of objects, in a technique known as structured light (SL) [20, 21].

Airborne LiDAR sensors have been used in a number of ecological studies, generating 3D models from point clouds that are typically used to investigate relationships between animal diversity and quantifiable attributes of vegetation and topography [22]. Although not obtained directly from animal-attached sensors, previous studies have integrated airborne LiDAR data with information from GPS-equipped collars to examine factors underlying animal movement patterns including habitat structure, social interactions, and thermoregulation [23,24,25]. In addition, aerial-derived point cloud models have been used to quantify the visible area or ‘viewshed’ of lions at kill sites, leading to insights on predator–prey relationships [11, 12]. Various other biological applications of active depth sensing include terrestrial and aerial vegetation surveys in forestry research and the automated identification of plant species [26,27,28,29]. Structured light sensors have also successfully been used to scan animals (e.g. cattle) from a fixed position and in milking robots [30, 31]. Perhaps similar to how demand in consumer electronics helped drive the availability of low-cost portable sensors such as accelerometers [32], active depth sensors are now beginning to appear in relatively small mobile devices which may further aid their adoption in research.

A separate and potentially complementary technique known as visual odometry (VO) can be used to estimate motion by tracking visual features over time [33]. The fusion of inertial data from accelerometers and gyroscopes (visual–inertial odometry, VIO) to further improve estimates of position and orientation (pose) has gained popularity in the field of robotics as a method to perform localisation in areas where GPS is intermittent or not available [34,35,36]. In addition to indoor environments, VIO could be of use in areas with dense vegetation or challenging terrain [37, 38].

This report trials the application of an animal-attached active depth-sensing (SL-based) and motion-tracking (VIO) device to record fine-scale movement within a reconstructed 3D model of the surrounding environment, without reliance on GPS. A segmentation pipeline is also demonstrated for the identification of neighbouring objects, testing the feasibility of such technology to investigate factors that influence animal behaviour and movement in an outdoor environment.

Methods

A one year-old female Labrador–golden retriever (Canis familiaris, body mass 32 kg) was used as the subject during the trial. The study took place in Northern Ireland, during late August in a section of mature coniferous forest (primarily Sitka spruce, Picea sitchensis) under the canopy. Trees had few remaining lower branches, and the forest floor was relatively flat with a blanket of dry pine needle litter. This environment was selected as the segmentation and identification of trees from point clouds had previously proven successful in forestry research. In addition, potential interference from direct sunlight could be avoided. The initial trial area measured 30 m in length and approximately 4.5 m in width, the outer edges of which were marked by placing large (\(0.76\times 1.02\,\hbox {m}\)) sheets of cardboard. The circumference/girth of each tree was measured at a consistent height of 1.0 m (or the narrowest point below a split trunk) using a fibreglass tape measure to ensure clearance of root flare.

The recording device was a Project Tango Development Kit tablet (‘Yellowstone’, NX-74751; Google Inc., CA, USA) running Android version 4.4.2 and Project Tango Core version 1.34 [39]. The Tango device projects SL onto surrounding objects and surfaces using an infrared (IR) laser (see Fig. 1a). This light is then detected using an RGB-IR camera (Fig. 1b) to measure depth which can be represented in point cloud form (Fig. 1c). Visual information is obtained from a fisheye camera (Fig. 1d) in which image features are tracked between frames during motion (Fig. 1e). This is fused with data from inertial sensors (tri-axial accelerometer and gyroscope, Fig. 1f) to track pose with reference to an initial starting point or origin by VIO (e.g. [34, 36]; Fig. 1g). Combining depth sensing with motion tracking allows point clouds to be accumulated over time and enables the reconstruction of an environment in three dimensions. Data were recorded on the device using ParaView Tango Recorder [40] (with a minor modification to record every frame of the point cloud data, rather than every third) in ‘Auto Mode’, which records both point cloud and pose data to internal storage. Depth was recorded at a frame rate of approximately 5 Hz (mostly between 0.5 and 4.0 m), and pose estimates were returned at 100 Hz.

The device was mounted on to the dorsal area of the dog using a harness (Readyaction™ Dog Harness plus a ‘Sport 2’ attachment) with the depth-sensing and motion-tracking cameras facing forward. The combined weight of the recording device and harness was 652 g, corresponding to 2.0% of body mass. The animal was first held motionless for several seconds while the device initialised. The dog was then guided in a straight line across the initial study area at a steady walking pace on a lead (Fig. 2). Following this, a second stage of the trial was performed by guiding the animal through the forested area in a non-predetermined path over more challenging terrain for an extended period of time. Both stages of the trial were carried out between late afternoon and early evening.

The compressed files containing the recorded data were then downloaded from the device. Data were visualised and analysed in ParaView/PvBatch version 4.1.0 [41], using point cloud library (PCL) plugin version 1.1 [42] (with modifications, including additional orientation constraints on model fitting and options for the segmentation of ground points) built with PCL version 1.7.2 [43]. Filters distributed with ParaView Tango Recorder were used to prepare the imported data as follows: 1) depth points were transformed to align with the pose data (‘Apply Pose To PointCloud’ filter); 2) point clouds were accumulated over time to produce an overall model of the study site (‘Accumulate Point Clouds over time’ filter, see Fig. 1h); 3) orientation of the device was obtained by applying the ‘Convert Quaternion to Orientation Frame’ filter to the pose data. The bounds of the study area were manually identified by visual inspection for the cardboard markers in the 3D model, and points that fell outside were removed using ParaView Clip filters. The PCL Radius Outlier Removal filter was then used to label points with less than 10 neighbours within a search radius of 0.3 m (see Fig. 1i). Outliers were subsequently removed using the Threshold Points passthrough filter. In order to reduce processing time and obtain a more homogeneous point density, the point cloud was then downsampled using the PCL Voxel Grid filter with a leaf size of 0.02 m (Fig. 1j). To aid in viewing the structure of the point cloud, the Elevation filter was applied to colour points by height. To provide a ground truth, each point was assigned an identification number and those corresponding to tree trunks were interactively isolated and annotated by frustum selection.

Following this, a progressive morphological filter [44] was applied using PCL for the identification and segmentation of ground points (cell size 0.2 m, maximum window size 20, slope 1.0, initial distance 0.25 m, maximum distance 3.0 m; Fig. 1k). The points labelled as ground were removed using a Threshold Points filter. The PCL Euclidean Cluster filter was then applied to extract clusters of points representing potential objects (cluster tolerance \(0.1\hbox { m}\), minimum cluster size 300, maximum cluster size 50, 000; Fig. 1l).

Random sample consensus (RANSAC) is an iterative method used to estimate parameters of a model from data in the presence of outliers [45]. In the case of point clouds, this allows for the fitting of primitive shapes and derivation of their dimensions. For each Euclidean cluster, an attempt was made to fit a cylindrical model (corresponding to a tree trunk) using the PCL SAC Segmentation Cylinder filter (normal estimation search radius 0.1 m, normal distance weight 0.1, radius limit 0.3 m, distance threshold 0.25 m, maximum iterations 200). This filter was modified to search for cylinders only in the vertical axis, allowing for slight deviations (angle epsilon threshold 15.0 \(^\circ\); Fig. 1m). The precision [(true-positive points / (true-positive points + false-positive points)], recall [(true positives / (true positives + false negatives)] and F\(_{\beta }\) score of tree trunk segmentation were calculated using scikit-learn version 0.17.1 [46] in Python. Differences between ground-truth measurements of tree trunk girth and the RANSAC model derived values were analysed using a Wilcoxon signed-rank test in R version 3.3.1 [47]. The device position obtained from the pose data was used to plot the trajectory of a first-person view of the study animal moving through the accumulated point cloud. In this animation, the camera focal point was fixed on the final pose measurement and the up direction set to the vertical axis (Additional file 1).

Results

During the first stage of the trial, a total of 1.1 million depth points and 5264 pose estimates were recorded over a period of 53 s with a file size of 14.8 MB. After applying the Clip filter to the bounds of the study area, the number of points was reduced to 869,353. The radius outlier filter removed a total of 559 points (0.1% of the clipped point cloud). After application of the voxel grid filter, the point cloud was further reduced to a total of 324, 074 points. The progressive morphological filter labelled 158,574 points (48.9%) as belonging to the forest floor. A total of 30 Euclidean clusters were identified with a median of 4931 (interquartile range 3153–6880) points (1342 points were unassigned to a cluster). Overall, 28 of these clusters had a RANSAC vertical cylinder model fit and were therefore, at an object level, classed as tree trunks. On visual inspection, there were no occurrences of object-level false positives. With 30 trees present in the study area, this corresponded to an accuracy of 93.3%. Individual points were labelled with a precision, recall, and F\(_{\beta }\) score of 1.00, 0.88, and 0.93, respectively (see Fig. 3). The Wilcoxon signed-rank test revealed that there was no significant difference (\(V = 225\), p = 0.63; median difference 0.01 m, interquartile range − 0.02–0.02 m) between the actual tree trunk radii (assuming circularity, mean 0.11, ± \(0.04\,\hbox {m}\)) and the RANSAC coefficient-derived estimates (median 0.11 m, interquartile range 0.10–0.13 m). Segmented clusters representing tree trunks were found to have a mean height (minimum to maximum vertical distance between inlier points) of 1.95 ± 0.52 m.

During the extended stage of the trial, a total of 4.17 million depth points and 12, 761 pose estimates were recorded over a 2-min period (file size 55.3 MB). After accumulation into a single point cloud, removal of outliers and downsampling, a total of 83 tree trunks were identified by RANSAC with a median coefficient-derived radius of 0.11 m (interquartile range 0.09–0.13 m). The device pose indicated a travel distance of 81.92 m, with an Euclidean distance of 62.48 m, and a vertical descent of 8.70 m. See Additional files 2 and 3 for interactive views of the segmented point clouds from both stages of the trial.

Discussion

Movement is of fundamental importance to life, impacting key ecological and evolutionary processes [48]. From an energetics perspective, the concept of an ‘energy landscape’ describes the variation that an animal experiences in energy requirements while moving through an environment [49, 50]. For terrestrial species, heterogeneity in the energy landscape depends on the properties of terrain, with animals predicted to select movement paths that allow them to minimise costs and maximise energy gain. At an individual level, an animal may also show deviations from landscape model predictions as it undergoes fitness related trade-offs seeking short-term optimality (e.g. for predator avoidance) [51]. In a previous accelerometer and GPS biologging study on the energy landscapes of a small forest-dwelling mammal (Pekania pennanti) [52], energy expenditure was found to be related to habitat suitability. However, it was not possible to identify the environmental characteristics that influenced individual energy expenditure, thus highlighting a need for methods that can record environmental information from the perspective of a study animal at both higher temporal and spatial resolutions.

In the present study, an animal-attached depth-sensing and motion-tracking device was used to construct 3D models, segment, and identify specific objects within an animal’s surroundings. Animal-scale 3D environmental models collected from free-ranging individuals have great potential to be used in the measurement of ground inclination, obstacle detection, and derivation of various surface roughness or traversability indexes (e.g. [53,54,55]). Such variables could then be examined in relation to accelerometer-derived proxies of energy expenditure to further our understanding of the ‘energetic envelope’ [51] within which an animal may optimise its behavioural patterns. Furthermore, VIO-based motion-tracking could be used to test widely debated random walk models of animal foraging and search processes [56,57,58,59].

The segmentation of point clouds and fitting of cylindrical models by RANSAC enabled the labelling and characterisation of tree trunks surrounding the study animal. While such an approach proved suitable for the environment in which the current study took place, more varied scenes would require the testing of alternative features and classification algorithms in order to distinguish between a wider range of objects (e.g. [60,61,62,63,64]). The ability to accurately model and identify specific objects from the perspective of an animal, while simultaneously tracking motion could have wide ranging biotelemetry applications such as studying the movement ecology of elusive or endangered species, and investigating potential routes of disease transmission.

Susceptibility to interference from direct sunlight presents a significant challenge to the SL depth-sensing method used in the current study. While this did not greatly influence the results of the present trial, as it was conducted under the canopy in a coniferous forest, future outdoor applications of animal-attached depth sensing may need to explore the use of passive solutions. For example, previous work has produced promising results on outdoor model reconstruction using mobile devices to perform motion stereo, which is insensitive to sunlight and also notably improved the range of depth perception [65, 66]. A hybrid approach, using both SL and stereo reconstruction, may provide advantages, particularly when measuring inclined surfaces [67]. In addition, the motion-tracking camera of the device used in the current study requires sufficient levels of visible light for pose estimation. This could impede deployments of a similar device on nocturnal species or those that inhabit areas with poor lighting conditions. One solution may be to utilise IR (or multi-spectral) imaging, which has previously been demonstrated for both VO [68,69,70] and stereo reconstruction [71].

Over time, pose estimates obtained by VIO alone can be prone to drift, potentially leading to misalignment of point clouds. Therefore, future work may also attempt to use visual feature tracking algorithms in order to recognise areas that have been revisited (i.e. within an animal’s home range) to perform drift correction or loop closure [72,73,74]. Such a feature, known as ‘Area Learning’ on the Tango platform, could allow researchers to visit and ‘learn’ an area in advance to produce area description files that correct errors in trajectory data. The application of such techniques in outdoor environments, across seasons, under a range of weather conditions at different times of day is challenging and subject to active research [75, 76]. For other forest-based evaluations of the platform accuracy, see [77, 78]. Physical factors that could impact the performance of motion tracking or SL depth sensing include a lack of visual features and the reflective properties of surfaces [79,80,81]. The raw point clouds used in the present study disregard non-surface information and can be susceptible to sensor noise. Future work may therefore seek to improve the quality of 3D reconstruction by experimenting with alternative techniques such as optimised variants of occupancy grid mapping and truncated signed distance fields (TSDF) which use the passthrough data of emanating rays to provide more detailed volumetric information [82, 83]. Ideally, the performance of motion tracking and depth sensing would also be tested with a wider range of environments, vegetation types, and movement speeds to closer emulate conditions found in more challenging field deployments. It may be possible to reconstruct occluded and unobserved regions of models using hole filling techniques (e.g. [84]). In relatively dynamic scenes, depth sensors have been used to track the trajectory of objects in motion (e.g. humans [85, 86]). The removal of dynamic objects from 3D models generated by mobile devices has also been demonstrated [87].

The development kit used in the current study was primarily intended for indoor use with a touch screen to allow human interaction. Therefore, careful consideration would be needed before routine deployments of a device with similar capabilities on other terrestrial animals. When preparing such a device, particular attention should be focused on the mass, form factor, and attachment method in order to reduce potential impact on the welfare and behaviour of a wild study animal [88, 89]. Whenever possible, applications should attempt to gracefully handle adverse situations such as temporary loss of motion tracking due to objects obstructing cameras at close range, or sudden movements overloading the sensors. Additionally, onboard downsampling of point cloud data could reduce storage requirements over longer deployments. Official support for the device used in the present study has now ended (partially succeeded by ARCore [90]); however, the general concepts of combined depth sensing and VIO motion tracking are not vendor specific. Active depth-sensing capabilities can be added to standard mobile devices using products such as the Occipital Structure Sensor [91]. Various open-source implementations of VO motion-tracking algorithms may also be suitable for deployment with further development (e.g. [92]). Future work may seek to compare the performance of (or augment) animal-attached VIO motion tracking with previously described magnetometer, accelerometer, and GPS-based dead-reckoning methods (e.g. [93]) in a range of environments. For example, under dense vegetation where the performance of GPS can deteriorate, VIO could offer significant advantages [38, 94]. The simultaneous collection of tri-axial accelerometer data would also allow the classification of animal behaviour along the pose trajectory within reconstructed models. This could enable further research into links between specific behaviours and various structural environmental attributes. In comparison with point clouds obtained from airborne laser scanning, similar to terrestrial laser scanning, animal-attached depth sensing could enable a higher point density at viewing angles more appropriate for the resolution of small or vertical objects [95].

Conclusions

The application of an animal-attached active depth-sensing and motion-tracking device enabled environmental reconstruction with 3D point clouds. Model segmentation allowed the semantic labelling of objects (tree trunks) surrounding the subject animal in a forested environment with high precision and recall, resulting in reliable estimates of their physical properties (i.e. radius/circumference). The simultaneous collection of depth information and device pose allowed for reconstruction of the animal’s movement path within the study site. Whilst this case report discussed the technical challenges that remain prior to routine field deployment of such a device in biotelemetry studies, it demonstrates that animal-attached depth sensing and VIO motion tracking have great potential as visual biologging techniques to provide detailed environmental context to animal behaviour.

Fig. 1
figure 1

Flow diagram illustrating device operation and point cloud processing. Device operation is shown in grey including depth-sensing and motion-tracking sensors. The accumulated point cloud was then processed through a segmentation pipeline (white).

Fig. 2
figure 2

The study animal with harness-mounted Tango device. The study animal (domestic dog) at the site of the first stage of the trial. The device was positioned dorsally using a dog harness with the depth-sensing and motion-tracking cameras facing forward

Fig. 3
figure 3

Accumulated point cloud and pose from an animal-attached device in a forest environment. Trimmed point cloud from the first stage of the trial. Ground points were labelled using a progressive morphological filter and coloured by height using an elevation filter. Tree trunks labelled by RANSAC are highlighted in green (28 of 30). The position of the device over time, and orientation are represented by the white line and superimposed arrows, respectively. The corner axes represent the orientation of the point cloud. Panels (b) and (c) represent side and top-down views